You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by jo...@apache.org on 2021/11/19 20:15:40 UTC

[arrow] branch master updated: ARROW-13400 [R] Update fs.Rmd (Working with S3) vignette

This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new c0fe679  ARROW-13400 [R] Update fs.Rmd (Working with S3) vignette
c0fe679 is described below

commit c0fe679aa4369f3f0ea85209d385fdd89d79e3b3
Author: Dewey Dunnington <de...@fishandwhistle.net>
AuthorDate: Fri Nov 19 14:14:11 2021 -0600

    ARROW-13400 [R] Update fs.Rmd (Working with S3) vignette
    
    Just a few updates and fixes to rough edges according to the notes in ARROW-13400! In particular,
    
    - Added a section on using `proxy_options`
    - Added that you can use `$ls()` to view a directory listing (I found this useful when testing the S3 proxy server stuff)
    
    Closes #11729 from paleolimbot/r-s3-vignette
    
    Authored-by: Dewey Dunnington <de...@fishandwhistle.net>
    Signed-off-by: Jonathan Keane <jk...@gmail.com>
---
 r/vignettes/fs.Rmd | 40 ++++++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/r/vignettes/fs.Rmd b/r/vignettes/fs.Rmd
index 5d699c4..6990469 100644
--- a/r/vignettes/fs.Rmd
+++ b/r/vignettes/fs.Rmd
@@ -32,7 +32,7 @@ For example, one of the NYC taxi data files used in `vignette("dataset", package
 s3://ursa-labs-taxi-data/2019/06/data.parquet
 ```
 
-Given this URI, we can pass it to `read_parquet()` just as if it were a local file path:
+Given this URI, you can pass it to `read_parquet()` just as if it were a local file path:
 
 ```r
 df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet")
@@ -54,7 +54,7 @@ This may be convenient when dealing with
 long URIs, and it's necessary for some options and authentication methods
 that aren't supported in the URI format.
 
-With a `FileSystem` object, we can point to specific files in it with the `$path()` method.
+With a `FileSystem` object, you can point to specific files in it with the `$path()` method.
 In the previous example, this would look like:
 
 ```r
@@ -62,13 +62,20 @@ bucket <- s3_bucket("ursa-labs-taxi-data")
 df <- read_parquet(bucket$path("2019/06/data.parquet"))
 ```
 
-See the help for `FileSystem` for a list of options that `s3_bucket()` and `S3FileSystem$create()`
+You can list the files and/or directories in an S3 bucket or subdirectory using
+the `$ls()` method:
+
+```r
+bucket$ls()
+```
+
+See `help(FileSystem)` for a list of options that `s3_bucket()` and `S3FileSystem$create()`
 can take. `region`, `scheme`, and `endpoint_override` can be encoded as query
 parameters in the URI (though `region` will be auto-detected in `s3_bucket()` or from the URI if omitted).
 `access_key` and `secret_key` can also be included,
 but other options are not supported in the URI.
 
-The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`, which holds a path and a file system to which it corresponds. `SubTreeFileSystem`s can be useful for holding a reference to a subdirectory somewhere, on S3 or elsewhere.
+The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`, which holds a path and a file system to which it corresponds. `SubTreeFileSystem`s can be useful for holding a reference to a subdirectory somewhere (on S3 or elsewhere).
 
 One way to get a subtree is to call the `$cd()` method on a `FileSystem`
 
@@ -86,21 +93,30 @@ june2019 <- SubTreeFileSystem$create("s3://ursa-labs-taxi-data/2019/06")
 ## Authentication
 
 To access private S3 buckets, you need typically need two secret parameters:
-a `access_key`, which is like a user id,
-and `secret_key`, like a token.
-There are a few options for passing these credentials:
+a `access_key`, which is like a user id, and `secret_key`, which is like a token
+or password. There are a few options for passing these credentials:
 
-1. Include them in the URI, like `s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to [URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if they contain special characters like "/".
+- Include them in the URI, like `s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to [URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if they contain special characters like "/" (e.g., `URLencode("123/456", reserved = TRUE)`).
 
-2. Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or `s3_bucket()`
+- Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or `s3_bucket()`
 
-3. Set them as environment variables named `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`, respectively.
+- Set them as environment variables named `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`, respectively.
 
-4. Define them in a `~/.aws/credentials` file, according to the [AWS documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html).
+- Define them in a `~/.aws/credentials` file, according to the [AWS documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html).
 
-You can also use an [AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
+- Use an [AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
 for temporary access by passing the `role_arn` identifier to `S3FileSystem$create()` or `s3_bucket()`.
 
+## Using a proxy server
+
+If you need to use a proxy server to connect to an S3 bucket, you can provide
+a URI in the form `http://user:password@host:port` to `proxy_options`. For
+example, a local proxy server running on port 1316 can be used like this:
+
+```r
+bucket <- s3_bucket("ursa-labs-taxi-data", proxy_options = "http://localhost:1316")
+```
+
 ## File systems that emulate S3
 
 The `S3FileSystem` machinery enables you to work with any file system that