You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2021/09/10 23:46:24 UTC

[arrow-cookbook] branch main updated: Adding anonymous flag to s3 (#70)

This is an automated email from the ASF dual-hosted git repository.

westonpace pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new 9750a64  Adding anonymous flag to s3 (#70)
9750a64 is described below

commit 9750a6402436f0379a9a7bde4184076c615f5a93
Author: Tomek Drabas <dr...@gmail.com>
AuthorDate: Fri Sep 10 16:46:18 2021 -0700

    Adding anonymous flag to s3 (#70)
    
    * Adding anonymous flag to s3
    
    * Fixing missing comma
    
    * Info about s3 credentials
---
 python/source/io.rst | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/python/source/io.rst b/python/source/io.rst
old mode 100644
new mode 100755
index 2c1fd82..db03d74
--- a/python/source/io.rst
+++ b/python/source/io.rst
@@ -394,7 +394,10 @@ partitioned data coming from remote sources like S3 or HDFS.
     from pyarrow import fs
 
     # List content of s3://ursa-labs-taxi-data/2011
-    s3 = fs.SubTreeFileSystem("ursa-labs-taxi-data", fs.S3FileSystem(region="us-east-2"))
+    s3 = fs.SubTreeFileSystem(
+        "ursa-labs-taxi-data", 
+        fs.S3FileSystem(region="us-east-2", anonymous=True)
+    )
     for entry in s3.get_file_info(fs.FileSelector("2011", recursive=True)):
         if entry.type == fs.FileType.File:
             print(entry.path)
@@ -419,7 +422,7 @@ by ``month`` using
 
 .. testcode::
 
-    dataset = ds.dataset("s3://ursa-labs-taxi-data/2011", 
+    dataset = ds.dataset("s3://ursa-labs-taxi-data/2011",
                          partitioning=["month"])
     for f in dataset.files[:10]:
         print(f)
@@ -447,6 +450,27 @@ or :meth:`pyarrow.dataset.Dataset.to_batches` like you would for a local one.
     It is possible to load partitioned data also in the ipc arrow
     format or in feather format.
 
+.. warning::
+
+    If the above code throws an error most likely the reason is your
+    AWS credentials are not set. Follow these instructions to get
+    ``AWS Access Key Id`` and ``AWS Secret Access Key``: 
+    `AWS Credentials <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html>`_.
+
+    The credentials are normally stored in ``~/.aws/credentials`` (on Mac or Linux)
+    or in ``C:\Users\<USERNAME>\.aws\credentials`` (on Windows) file. 
+    You will need to either create or update this file in the appropriate location.
+
+    The contents of the file should look like this:
+
+    .. code-block:: bash 
+
+        [default]
+        aws_access_key_id=<YOUR_AWS_ACCESS_KEY_ID>
+        aws_secret_access_key=<YOUR_AWS_SECRET_ACCESS_KEY>
+
+
+
 Write a Feather file
 ====================