You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by fs...@apache.org on 2020/06/25 16:28:24 UTC

[arrow] branch master updated: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

This is an automated email from the ASF dual-hosted git repository.

fsaintjacques pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 8bd34e8  ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation
8bd34e8 is described below

commit 8bd34e869181b0dc4f03d15e989d9e511042790f
Author: François Saint-Jacques <fs...@gmail.com>
AuthorDate: Thu Jun 25 12:27:58 2020 -0400

    ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation
    
    Closes #7517 from fsaintjacques/ARROW-1682
    
    Authored-by: François Saint-Jacques <fs...@gmail.com>
    Signed-off-by: François Saint-Jacques <fs...@gmail.com>
---
 docs/source/python/dataset.rst | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/docs/source/python/dataset.rst b/docs/source/python/dataset.rst
index ae14e39..3d99834 100644
--- a/docs/source/python/dataset.rst
+++ b/docs/source/python/dataset.rst
@@ -325,6 +325,24 @@ The currently available classes are :class:`~pyarrow.fs.S3FileSystem` and
 details.
 
 
+Reading from Minio
+------------------
+
+In addition to cloud storage, pyarrow also supports reading from a
+`MinIO https://github.com/minio/minio`_ object storage instance emulating S3
+APIs. Paired with `toxiproxy https://github.com/shopify/toxiproxy`_, this is
+useful for testing or benchmarking.
+
+.. code-block:: python
+
+    from pyarrow import fs
+
+    # By default, MinIO will listen for unencrypted HTTP traffic.
+    minio = fs.S3FileSystem(scheme="http", endpoint="localhost:9000")
+    dataset = ds.dataset("ursa-labs-taxi-data/", filesystem=minio,
+                         partitioning=["year", "month"])
+
+
 Manual specification of the Dataset
 -----------------------------------