You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/27 16:27:22 UTC

[GitHub] [arrow] pachadotdev commented on a change in pull request #10546: ARROW-12845: [R] [C++] S3 connections for different providers

pachadotdev commented on a change in pull request #10546:
URL: https://github.com/apache/arrow/pull/10546#discussion_r677612740



##########
File path: r/vignettes/fs.Rmd
##########
@@ -128,3 +128,74 @@ s3://minioadmin:minioadmin@?scheme=http&endpoint_override=localhost%3A9000
 
 Among other applications, this can be useful for testing out code locally before
 running on a remote S3 bucket.
+
+## Non-AWS S3 cloud alternatives (DigitalOcean, IBM, Alibaba, and others)
+
+*This section adapts some elements from [Analyzing Room Temperature Data](https://www.jaredlander.com/2021/03/analyzing-room-temperature-data/#getting-the-data) by Jared Lander.*
+
+If you are using any Amazon S3 Compliant Storage Provider, such as AWS, Alibaba, 
+Ceph, DigitalOcean, Dreamhost, IBM COS, Minio, or others, you can connect to it 
+with `arrow` by using the `S3FileSystem` function as for the case of using 
+MinIO locally. Please note that the use of DigitalOcean here is just an example, as 
+it can be any other S3 compatible service.
+
+At the begininning of this vignette we used:
+
+```r
+june2019 <- SubTreeFileSystem$create("s3://ursa-labs-taxi-data/2019/06")
+```
+
+Which connects to AWS, and the same can be adapted for other providers, For 
+instructional purposes, we provide [nyc-taxi.sfo3.digitaloceanspaces.com](https://nyc-taxi.sfo3.digitaloceanspaces.com), 
+which is a public storage with the NYC taxi data used in
+[Working with Arrow Datasets and dplyr](dataset.html).
+
+To connect to this space, you only need to adapt the code from the previous
+section:
+
+```r
+space <- arrow::S3FileSystem$create(
+  anonymous = TRUE,
+  scheme = "https",
+  endpoint_override = "sfo3.digitaloceanspaces.com"
+)
+```
+
+The space that we are using space allows anonymous access, but if you were to 
+connect to a private space (i.e. with sensitive data), you would need to 
+provide a token, say:
+
+```r
+space <- arrow::S3FileSystem$create(
+  access_key = Sys.getenv('DO_ARROW_TAXI_TOKEN'),
+  secret_key = Sys.getenv('DO_ARROW_TAXI_SECRET'),
+  scheme = "https",
+  endpoint_override = "sfo3.digitaloceanspaces.com"
+)
+```

Review comment:
       this is great




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org