You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2019/11/09 18:01:00 UTC

[jira] [Commented] (ARROW-7102) Make filesystem wrappers compatible with fsspec

    [ https://issues.apache.org/jira/browse/ARROW-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970902#comment-16970902 ] 

Antoine Pitrou commented on ARROW-7102:
---------------------------------------

Hmm, so, to make sure there's no misunderstanding, we have been developing a C++ filesystem layer that's pretty much functional now, with local, mock and S3 implementations.

There's also a Python wrapping layer for the C++ filesystem layer. It's in {{pyarrow.fs}}, not {{pyarrow.filesystem}} which we now consider as legacy.

Going forward, two things could be useful:
* Make a {{fsspec}} wrapper for {{pyarrow.fs}}
* Make a {{pyarrow.fs}} wrapper for {{fsspec}}

You can find the C++ filesystem API doc here:
https://arrow.apache.org/docs/cpp/api/filesystem.html#_CPPv4N5arrow2fs10FileSystemE
The Python wrappers are not documented on the website yet. But there are docstrings that you can also find here:
https://github.com/apache/arrow/blob/master/python/pyarrow/_fs.pyx#L172


> Make filesystem wrappers compatible with fsspec
> -----------------------------------------------
>
>                 Key: ARROW-7102
>                 URL: https://issues.apache.org/jira/browse/ARROW-7102
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Tom Augspurger
>            Priority: Major
>              Labels: FileSystem
>
> [fsspec|https://filesystem-spec.readthedocs.io/en/latest] defines a common API for a variety filesystem implementations. I'm proposing a FSSpecWrapper, similar to S3FSWrapper, that works with any fsspec implementation.
>  
> Right now, pyarrow has a pyarrow.filesystems.S3FSWrapper, which is specific to s3fs. [https://github.com/apache/arrow/blob/21ad7ac1162eab188a1e15923fb1de5b795337ec/python/pyarrow/filesystem.py#L320]. This implementation could be removed entirely once an FSSPecWrapper is done, or kept as an alias if it's part of the public API.
>  
> This is realted to ARROW-3717, which requested a GCSFSWrapper for working with google cloud storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)