You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Weston Pace <we...@gmail.com> on 2020/08/19 21:18:18 UTC

Questions about S3 options

To use S3 it appears I can either use `pyarrow.fs.S3FileSystem` or I
can use s3fs and it gets wrapped (I think) with
`pyarrow.fs.DaskFileSystem`.  However, I don't see any documentation
for `pyarrow.fs.DaskFileSystem`.  Is this option supported going
forwards?  I'm currently configuring an s3fs instance for S3 access
elsewhere and so I'd rather reuse this if possible.

-Weston Pace

Re: Questions about S3 options

Posted by Joris Van den Bossche <jo...@gmail.com>.

Hi Weston,

Sorry for the late reply. For using S3 in pyarrow, there are indeed 2
options: using the implementation provided by arrow
(`pyarrow.fs.S3FileSystem`) or using s3fs which gets wrapped by
pyarrow.
Note that the wrapper is not actually DaskFileSystem: for the legacy
filesystems we use s3fs directly, for the new filesystems it gets
wrapped using `FSSpecHandler`.

Both options are supported going forward. It might that at some point
the built-in one will be more tightly integrated, since for s3fs it is
being used through a generic wrapper.

Best,
Joris

On Wed, 19 Aug 2020 at 23:18, Weston Pace <we...@gmail.com> wrote:
>
> To use S3 it appears I can either use `pyarrow.fs.S3FileSystem` or I
> can use s3fs and it gets wrapped (I think) with
> `pyarrow.fs.DaskFileSystem`.  However, I don't see any documentation
> for `pyarrow.fs.DaskFileSystem`.  Is this option supported going
> forwards?  I'm currently configuring an s3fs instance for S3 access
> elsewhere and so I'd rather reuse this if possible.
>
> -Weston Pace