You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Carl Boettiger (Jira)" <ji...@apache.org> on 2022/05/19 19:26:00 UTC

[jira] [Created] (ARROW-16619) read_csv_arrow / open_dataset over https connection?

Carl Boettiger created ARROW-16619:
--------------------------------------

             Summary: read_csv_arrow / open_dataset over https connection?
                 Key: ARROW-16619
                 URL: https://issues.apache.org/jira/browse/ARROW-16619
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
            Reporter: Carl Boettiger


Currently, remote access to data (particularly lazy read, an immensely powerful arrow ability) only works for data in an S3-compliant object store (though I know Azure support is in the works).  It would be really fantastic if we could have remote access over HTTPS (I think this already works on the python side thanks to fsspec).  

For example, this fails in arrow but works in readr:


arrow::read_csv_arrow("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
 
readr::read_csv("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")

I think this ability would be even more compelling in `open_dataset()`, since it opens up for us all the power of lazy read access.  Most servers support curl range requests so it seems this should be possible.  (We can already do something similar from duckdb+R, but only after manually opting in the http extension and only for parquet).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)