You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2022/07/22 07:37:31 UTC
[arrow] branch master updated: ARROW-16818: [Doc][Python] Document GCS filesystem for PyArrow (#13681)
This is an automated email from the ASF dual-hosted git repository.
alenka pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 32016b1ade ARROW-16818: [Doc][Python] Document GCS filesystem for PyArrow (#13681)
32016b1ade is described below
commit 32016b1ade710a6585e2c1a1023d2e44a55420a8
Author: Rok Mihevc <ro...@mihevc.org>
AuthorDate: Fri Jul 22 09:37:23 2022 +0200
ARROW-16818: [Doc][Python] Document GCS filesystem for PyArrow (#13681)
This is to resolve [ARROW-16818](https://issues.apache.org/jira/browse/ARROW-16818).
Lead-authored-by: Rok Mihevc <ro...@mihevc.org>
Co-authored-by: Rok <ro...@mihevc.org>
Signed-off-by: Alenka Frim <fr...@gmail.com>
---
docs/source/python/api/filesystems.rst | 1 +
docs/source/python/filesystems.rst | 35 ++++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/docs/source/python/api/filesystems.rst b/docs/source/python/api/filesystems.rst
index f84a3e229c..4b416abd9e 100644
--- a/docs/source/python/api/filesystems.rst
+++ b/docs/source/python/api/filesystems.rst
@@ -40,6 +40,7 @@ Filesystem Implementations
LocalFileSystem
S3FileSystem
+ GcsFileSystem
HadoopFileSystem
SubTreeFileSystem
diff --git a/docs/source/python/filesystems.rst b/docs/source/python/filesystems.rst
index 1ddb4dfa2b..a34ce88bae 100644
--- a/docs/source/python/filesystems.rst
+++ b/docs/source/python/filesystems.rst
@@ -40,6 +40,7 @@ Pyarrow implements natively the following filesystem subclasses:
* :ref:`filesystem-localfs` (:class:`LocalFileSystem`)
* :ref:`filesystem-s3` (:class:`S3FileSystem`)
+* :ref:`filesystem-gcs` (:class:`GcsFileSystem`)
* :ref:`filesystem-hdfs` (:class:`HadoopFileSystem`)
It is also possible to use your own fsspec-compliant filesystem with pyarrow functionalities as described in the section :ref:`filesystem-fsspec`.
@@ -183,6 +184,40 @@ Example how you can read contents from a S3 bucket::
for the different ways to configure the AWS credentials.
+.. _filesystem-gcs:
+
+Google Cloud Storage File System
+--------------------------------
+
+PyArrow implements natively a Google Cloud Storage (GCS) backed file system
+for GCS storage.
+
+If not running on Google Cloud Platform (GCP), this generally requires the
+environment variable ``GOOGLE_APPLICATION_CREDENTIALS`` to point to a
+JSON file containing credentials.
+
+Example showing how you can read contents from a GCS bucket::
+
+ >>> from datetime import timedelta
+ >>> from pyarrow import fs
+ >>> gcs = fs.GcsFileSystem(anonymous=True, retry_time_limit=timedelta(seconds=15))
+
+ # List all contents in a bucket, recursively
+ >>> uri = "gcp-public-data-landsat/LC08/01/001/003/"
+ >>> file_list = gcs.get_file_info(fs.FileSelector(uri, recursive=True))
+
+ # Open a file for reading and download its contents
+ >>> f = gcs.open_input_stream(file_list[0].path)
+ >>> f.read(64)
+ b'GROUP = FILE_HEADER\n LANDSAT_SCENE_ID = "LC80010032013082LGN03"\n S'
+
+.. seealso::
+
+ The :class:`GcsFileSystem` constructor by default uses the
+ process described in `GCS docs <https://google.aip.dev/auth/4110>`__
+ to resolve credentials.
+
+
.. _filesystem-hdfs:
Hadoop Distributed File System (HDFS)