You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2022/07/22 07:37:31 UTC

[arrow] branch master updated: ARROW-16818: [Doc][Python] Document GCS filesystem for PyArrow (#13681)

This is an automated email from the ASF dual-hosted git repository.

alenka pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 32016b1ade ARROW-16818: [Doc][Python] Document GCS filesystem for PyArrow (#13681)
32016b1ade is described below

commit 32016b1ade710a6585e2c1a1023d2e44a55420a8
Author: Rok Mihevc <ro...@mihevc.org>
AuthorDate: Fri Jul 22 09:37:23 2022 +0200

    ARROW-16818: [Doc][Python] Document GCS filesystem for PyArrow (#13681)
    
    This is to resolve [ARROW-16818](https://issues.apache.org/jira/browse/ARROW-16818).
    
    Lead-authored-by: Rok Mihevc <ro...@mihevc.org>
    Co-authored-by: Rok <ro...@mihevc.org>
    Signed-off-by: Alenka Frim <fr...@gmail.com>
---
 docs/source/python/api/filesystems.rst |  1 +
 docs/source/python/filesystems.rst     | 35 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/docs/source/python/api/filesystems.rst b/docs/source/python/api/filesystems.rst
index f84a3e229c..4b416abd9e 100644
--- a/docs/source/python/api/filesystems.rst
+++ b/docs/source/python/api/filesystems.rst
@@ -40,6 +40,7 @@ Filesystem Implementations
 
    LocalFileSystem
    S3FileSystem
+   GcsFileSystem
    HadoopFileSystem
    SubTreeFileSystem
 
diff --git a/docs/source/python/filesystems.rst b/docs/source/python/filesystems.rst
index 1ddb4dfa2b..a34ce88bae 100644
--- a/docs/source/python/filesystems.rst
+++ b/docs/source/python/filesystems.rst
@@ -40,6 +40,7 @@ Pyarrow implements natively the following filesystem subclasses:
 
 * :ref:`filesystem-localfs` (:class:`LocalFileSystem`)
 * :ref:`filesystem-s3` (:class:`S3FileSystem`)
+* :ref:`filesystem-gcs` (:class:`GcsFileSystem`)
 * :ref:`filesystem-hdfs` (:class:`HadoopFileSystem`)
 
 It is also possible to use your own fsspec-compliant filesystem with pyarrow functionalities as described in the section :ref:`filesystem-fsspec`.
@@ -183,6 +184,40 @@ Example how you can read contents from a S3 bucket::
    for the different ways to configure the AWS credentials.
 
 
+.. _filesystem-gcs:
+
+Google Cloud Storage File System
+--------------------------------
+
+PyArrow implements natively a Google Cloud Storage (GCS) backed file system
+for GCS storage.
+
+If not running on Google Cloud Platform (GCP), this generally requires the
+environment variable ``GOOGLE_APPLICATION_CREDENTIALS`` to point to a
+JSON file containing credentials.
+
+Example showing how you can read contents from a GCS bucket::
+
+   >>> from datetime import timedelta
+   >>> from pyarrow import fs
+   >>> gcs = fs.GcsFileSystem(anonymous=True, retry_time_limit=timedelta(seconds=15))
+
+   # List all contents in a bucket, recursively
+   >>> uri = "gcp-public-data-landsat/LC08/01/001/003/"
+   >>> file_list = gcs.get_file_info(fs.FileSelector(uri, recursive=True))
+
+   # Open a file for reading and download its contents
+   >>> f = gcs.open_input_stream(file_list[0].path)
+   >>> f.read(64)
+   b'GROUP = FILE_HEADER\n  LANDSAT_SCENE_ID = "LC80010032013082LGN03"\n  S'
+
+.. seealso::
+
+   The :class:`GcsFileSystem` constructor by default uses the
+   process described in `GCS docs <https://google.aip.dev/auth/4110>`__
+   to resolve credentials.
+
+
 .. _filesystem-hdfs:
 
 Hadoop Distributed File System (HDFS)