You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/08/13 08:47:00 UTC
[jira] [Created] (ARROW-9718) [Python] Make pyarrow.parquet work
with the new filesystem interfaces
Joris Van den Bossche created ARROW-9718:
--------------------------------------------
Summary: [Python] Make pyarrow.parquet work with the new filesystem interfaces
Key: ARROW-9718
URL: https://issues.apache.org/jira/browse/ARROW-9718
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Joris Van den Bossche
The place internally where the "legacy" `pyarrow.filesystem` filesystems are still used is in the {{pyarrow.parquet}} module.
It is used in:
- ParquetWriter
- ParquetManifest/ParquetDataset
- write_to_dataset
For {{ParquetWriter}}, we need to update this to work with the new filesystems (since ParquetWriter is not dataset related, and thus won't be deprecated).
For {{ParquetManifest}}/{{ParquetDataset}}, it might not need to be updated, since those might get deprecated itself (to be discussed), and when using the {{use_legacy_dataset=False}} option, it already uses the new datasets.
For {{write_to_dataset}}, this might depend on how the writing capabilities of the dataset project evolve.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)