You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/22 18:16:56 UTC

[GitHub] [arrow] fsaintjacques opened a new pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

fsaintjacques opened a new pull request #7517:
URL: https://github.com/apache/arrow/pull/7517


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques closed pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

Posted by GitBox <gi...@apache.org>.
fsaintjacques closed pull request #7517:
URL: https://github.com/apache/arrow/pull/7517


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7517:
URL: https://github.com/apache/arrow/pull/7517#discussion_r445450482



##########
File path: docs/source/python/dataset.rst
##########
@@ -325,6 +325,22 @@ The currently available classes are :class:`~pyarrow.fs.S3FileSystem` and
 details.
 
 
+Reading from Minio
+------------------
+
+In addition to cloud storage, pyarrow also supports reading from a MinIO object
+storage instance emulating S3 APIs. Paired with toxiproxy, this is useful for

Review comment:
       Can you add hyperlinks to MinIO and toxyproxy?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7517:
URL: https://github.com/apache/arrow/pull/7517#discussion_r445450752



##########
File path: docs/source/python/dataset.rst
##########
@@ -325,6 +325,22 @@ The currently available classes are :class:`~pyarrow.fs.S3FileSystem` and
 details.
 
 
+Reading from Minio
+------------------
+
+In addition to cloud storage, pyarrow also supports reading from a MinIO object
+storage instance emulating S3 APIs. Paired with toxiproxy, this is useful for
+testing or benchmarking.
+
+.. code-block:: python
+
+    from pyarrow import fs
+
+    minio = fs.S3FileSystem(scheme="http", endpoint="localhost:9000")

Review comment:
       Add a comment that this assumes MinIO is running unencrypted on local port 9000?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7517:
URL: https://github.com/apache/arrow/pull/7517#issuecomment-647699887


   https://issues.apache.org/jira/browse/ARROW-1682


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #7517:
URL: https://github.com/apache/arrow/pull/7517#issuecomment-648247832


   Thanks, now I understand. So the pairing with toxiproxy is for the testing :))
   That's what you wrote, I just misunderstood


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7517:
URL: https://github.com/apache/arrow/pull/7517#issuecomment-648244980


   I can't comment on the production quality of MinIO since I've never used it in such scenario. I meant this for reference to other developers who wants to test the S3 bindings without having to use an actual S3 bucket. For example, I have a dataset stored locally. When I want to test the difference between local and cloud storage, I spin a MinIO instance pointing to the same local directory and benchmark accordingly. toxiproxy is used to introduce latency and bandwidth limits mimicking S3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #7517:
URL: https://github.com/apache/arrow/pull/7517#issuecomment-647789155


   Does "for testing and benchmarking" mean that it's not optimal to store arrow/parquet files on Minio for production workloads?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org