You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Zac Davies (Jira)" <ji...@apache.org> on 2022/10/03 08:18:00 UTC

[jira] [Commented] (ARROW-14998) [R] Support for HTTPS Filesystem access

    [ https://issues.apache.org/jira/browse/ARROW-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612210#comment-17612210 ] 

Zac Davies commented on ARROW-14998:
------------------------------------

This would be great, as far as I can tell, this is required to access pre-signed S3 URL's as a Dataset without the need to download/sync all files so the Dataset is on the local filesystem.

> [R] Support for HTTPS Filesystem access
> ---------------------------------------
>
>                 Key: ARROW-14998
>                 URL: https://issues.apache.org/jira/browse/ARROW-14998
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: R
>            Reporter: Carl Boettiger
>            Priority: Major
>
> Thanks for such an amazing project. I've been entirely blown away by the S3 Filesystem access in the latest release; and excited to see other backends like Azure being discussed in the issues.  As you know, many https clients also permit range requests, meaning (I think) that it should be possible to access public data (parquet, csv files) over generic HTTPS connections too.
> As you probably know, duckdb already has support for https based remote file access, e.g. [https://github.com/duckdb/duckdb/blob/master/test/sql/copy/parquet/test_parquet_remote.test|https://github.com/duckdb/duckdb/blob/master/test/sql/copy/parquet/test_parquet_remote.test.]
>  (though it is not available out-of-the-box in the R client there either).
>  
> It would be wonderful to have a similar remote filesystem access that could work over HTTPS like that in arrow.  (I gather on the python side, fsspec already gives access to a wide number of such abstractions, but we're more limited in R so far, except for the geospatial data, where bindings to GDAL mean we can access GDAL's rather amazing virtual file systems over https, S3, FTP, etc, [https://gdal.org/user/virtual_file_systems.html] – a nice array-data complement to the more database-oriented workflow of arrow...).
>  
> Thanks for considering!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)