You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/07 10:14:16 UTC

[GitHub] [arrow] milesgranger opened a new pull request, #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem and resolve_s3_region

milesgranger opened a new pull request, #14599:
URL: https://github.com/apache/arrow/pull/14599

   Will fix [ARROW-18238](https://issues.apache.org/jira/browse/ARROW-18238)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #14599:
URL: https://github.com/apache/arrow/pull/14599#issuecomment-1306777453

   @wjones127 Would you like to review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14599:
URL: https://github.com/apache/arrow/pull/14599#discussion_r1016340805


##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +178,19 @@ Example how you can read contents from a S3 bucket::
    >>> f.readall()
    b'some data'
 
+   # Note, to resolve the S3 bucket, one can use `resolve_s3_region('my-test-bucket')`

Review Comment:
   I would move this to an actual text paragraph instead of "hiding" it somewhat in a comment in a longer example. And maybe first stating that the S3FileSystem needs to be configured with the correct region for the bucket you want to read. And then if you want to do that automatically instead of passing the region manually (as in the example above), explain and show there are two ways you can do that.



##########
python/pyarrow/_s3fs.pyx:
##########
@@ -174,7 +174,8 @@ cdef class S3FileSystem(FileSystem):
         The frequency (in seconds) with which temporary credentials from an
         assumed role session will be refreshed.
     region : str, default 'us-east-1'
-        AWS region to connect to.
+        AWS region to connect to. Use :func:`pyarrow.fs.resolve_s3_region` to

Review Comment:
   The default nowadays is not always "us-east-1", but will also depend on some configuration / env variables. See the expanded section in the C++ code about this: https://github.com/apache/arrow/blob/5889c78e344688f8fa8100ecdf254cd701ee3445/cpp/src/arrow/filesystem/s3fs.h#L105-L109 
   That is maybe too detailed, but wondering if we should update it here a bit 



##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +178,19 @@ Example how you can read contents from a S3 bucket::
    >>> f.readall()
    b'some data'
 
+   # Note, to resolve the S3 bucket, one can use `resolve_s3_region('my-test-bucket')`
+   >>> s3 = fs.S3FileSystem(region=resolve_s3_region('my-test-bucket'))
+
+   # Or alternatively...
+   >>> s3 = fs.S3FileSystem.from_uri('s3://[access_key:secret_key@]bucket/path[?region=]')

Review Comment:
   ```suggestion
      >>> s3, path = fs.S3FileSystem.from_uri('s3://[access_key:secret_key@]bucket/path[?region=]')
   ```
   
   `from_uri` returns two elements. 
   
   Also, maybe leave out the `[?region]` part, since the goal here is to show that `from_uri` will infer the region from the bucket name (without the need to manually specify it)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14599:
URL: https://github.com/apache/arrow/pull/14599#issuecomment-1309373732

   Benchmark runs are scheduled for baseline = 92c9f9428b1c75c7f2458a13adeb305b0f1944cd and contender = 94cf74f3b5b58a0bb45e7123ea907f2f21319776. 94cf74f3b5b58a0bb45e7123ea907f2f21319776 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/06d317ffc65a410d8039ed91b6515680...ba1b289b9e324a7888f84ffc1e173610/)
   [Finished :arrow_down:0.27% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/856b756974bf4d4eb4b6a37c87058b30...6b0cebc4553149d0be4813e84419db67/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/9cdbce5a082c427e843bbb900c41b5ee...ce4608dba660432f9d2835fa70b438ca/)
   [Finished :arrow_down:0.14% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/8505a2b8f1714ff2b89680262a910ed4...eda4170daab94028b5948f344ac38ff0/)
   Buildkite builds:
   [Finished] [`94cf74f3` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1837)
   [Finished] [`94cf74f3` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1859)
   [Finished] [`94cf74f3` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1825)
   [Finished] [`94cf74f3` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1851)
   [Finished] [`92c9f942` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1836)
   [Finished] [`92c9f942` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1858)
   [Finished] [`92c9f942` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1824)
   [Finished] [`92c9f942` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1850)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #14599:
URL: https://github.com/apache/arrow/pull/14599#discussion_r1016857376


##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +179,31 @@ Example how you can read contents from a S3 bucket::
    >>> f.readall()
    b'some data'
 
+
+Note that it is important to configure :class:`S3FileSystem` with the correct
+region for the bucket being used. If `region` is not set, the AWS SDK will
+choose a value, defaulting to 'us-east-1' if the SDK version is <1.8.
+Otherwise it'll try to use a variety of heuristics to resolve the region.
+
+It is also possible to resolve the region for :class:`S3FileSystem` by using

Review Comment:
   ```suggestion
   It is also possible to resolve the region using the bucket name for :class:`S3FileSystem` by using
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14599:
URL: https://github.com/apache/arrow/pull/14599#discussion_r1016391670


##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +178,19 @@ Example how you can read contents from a S3 bucket::
    >>> f.readall()
    b'some data'
 
+   # Note, to resolve the S3 bucket, one can use `resolve_s3_region('my-test-bucket')`

Review Comment:
   Indeed, I think that's better. :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14599:
URL: https://github.com/apache/arrow/pull/14599#discussion_r1016815088


##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +179,31 @@ Example how you can read contents from a S3 bucket::
    >>> f.readall()
    b'some data'
 
+
+Note that it is important to configure :class:`S3FileSystem` with the correct
+region for the bucket being used. If `region` is not set, the AWS SDK will
+choose a value, defaulting to 'us-east-1' if the SDK version is <1.8.
+Otherwise it'll try to use a variety of heuristics to resolve the region.

Review Comment:
   I would maybe add the "(environment variables, configuration profile, EC2 metadata server)" (copied from the C++ source), to make it clear that the "heuristics" don't include "infer it from the region"
   
   ```suggestion
   Otherwise it will try to use a variety of heuristics to resolve the region.
   ```



##########
docs/source/python/filesystems.rst:
##########
@@ -178,11 +179,31 @@ Example how you can read contents from a S3 bucket::
    >>> f.readall()
    b'some data'
 
+
+Note that it is important to configure :class:`S3FileSystem` with the correct
+region for the bucket being used. If `region` is not set, the AWS SDK will
+choose a value, defaulting to 'us-east-1' if the SDK version is <1.8.
+Otherwise it'll try to use a variety of heuristics to resolve the region.
+
+It is also possible to resolve the region for :class:`S3FileSystem` by using

Review Comment:
   ```suggestion
   It is also possible to resolve the region from the bucker name for :class:`S3FileSystem` by using
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem and resolve_s3_region

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14599:
URL: https://github.com/apache/arrow/pull/14599#issuecomment-1305385504

   https://issues.apache.org/jira/browse/ARROW-18238


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #14599: ARROW-18238: [Docs][Python] Improve docs for S3FileSystem

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche merged PR #14599:
URL: https://github.com/apache/arrow/pull/14599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org