You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/15 16:15:23 UTC

[GitHub] [arrow] marsupialtail opened a new pull request, #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

marsupialtail opened a new pull request, #13385:
URL: https://github.com/apache/arrow/pull/13385

   added timeout options to c++ and python s3fs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r921910110


##########
python/pyarrow/_s3fs.pyx:
##########
@@ -133,6 +133,21 @@ cdef class S3FileSystem(FileSystem):
         assumed role session will be refreshed.
     region : str, default 'us-east-1'
         AWS region to connect to.
+    request_timeout : double. unit in seconds. default 3. 
+        Socket read timeouts for HTTP clients on Windows. This should be more

Review Comment:
   Is this only for Windows?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r899256981


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -701,6 +701,17 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = options_.request_timeout_ms;

Review Comment:
   I can try on Linux perhaps. Is there an easy way of testing this? Just point to a non-existing endpoint?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche merged PR #13385:
URL: https://github.com/apache/arrow/pull/13385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1185952726

   I believe the dataset scanner build failures are unrelated to this PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r900691273


##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options {
   /// the region (environment variables, configuration profile, EC2 metadata
   /// server).
   std::string region;
+  long request_timeout_ms = -1;
+  long connect_timeout_ms = -1;

Review Comment:
   @pitrou So changing it to double actually is a bit problematic because the underlying AWS config data types are long. We'd have to convert them back to long if we use double here. Please let me know what you'd prefer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
raulcd commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r898932660


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -701,6 +701,17 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = options_.request_timeout_ms;
+    }
+    if (options_.connect_timeout_ms != -1) {
+      client_config_.connectTimeoutMs = options_.connect_timeout_ms;
+    }
+    if (!options_.region.empty()) {

Review Comment:
   I think this is already set on line 701



##########
python/pyarrow/_s3fs.pyx:
##########
@@ -156,7 +156,7 @@ cdef class S3FileSystem(FileSystem):
         CS3FileSystem* s3fs
 
     def __init__(self, *, access_key=None, secret_key=None, session_token=None,
-                 bint anonymous=False, region=None, scheme=None,
+                 bint anonymous=False, region=None, request_timeout_ms = None, connect_timeout_ms = None, scheme=None,

Review Comment:
   this will fail linting:
   ```suggestion
                    bint anonymous=False, region=None, request_timeout_ms=None, connect_timeout_ms=None, scheme=None,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r899244896


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -701,6 +701,17 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = options_.request_timeout_ms;

Review Comment:
   At least it works on Mac for me... I'm not sure why the documentation says it's windows-specific. will look into this. I am aware that on this: https://sdk.amazonaws.com/cpp/api/LATEST/struct_aws_1_1_client_1_1_client_configuration.html it says its windows specific, but the official docs don't mention anything about that: https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/client-config.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r898941480


##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options {
   /// the region (environment variables, configuration profile, EC2 metadata
   /// server).
   std::string region;
+  long request_timeout_ms = -1;
+  long connect_timeout_ms = -1;

Review Comment:
   We would probably want to be consistent accross the codebase and expose it as seconds, so perhaps:
   ```c++
     /// \brief Request timeout in seconds
     ///
     /// If not specified, the AWS SDK's default value will be used.
     util::optional<double> request_timeout;
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1205015689

   Benchmark runs are scheduled for baseline = ee874d67ddd417e5c33aff1979df782c4dfa1dfb and contender = 669009edfc4f7f1fe85a1de641e16f831eee45d6. 669009edfc4f7f1fe85a1de641e16f831eee45d6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e9a28d24e5b54d888129d436a1bf29f4...cea53e939ec140a291fce04b54fa500e/)
   [Finished :arrow_down:0.61% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/dc9c17f62a6a4387a9dc217bf011cd93...15f5a977afbf40c4963bc9a2fd6a6396/)
   [Finished :arrow_down:0.54% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6d37671adf174e8789675aaaa0ed644b...ca79dad6064049cc801bb436cf042196/)
   [Finished :arrow_down:0.18% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ed9feb48e85a4954b9cdc599bd019764...3ecc983a6f2a4e888983a54cdb204ada/)
   Buildkite builds:
   [Failed] [`669009ed` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1254)
   [Finished] [`669009ed` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1266)
   [Finished] [`669009ed` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1249)
   [Finished] [`669009ed` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1268)
   [Failed] [`ee874d67` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1253)
   [Finished] [`ee874d67` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1265)
   [Finished] [`ee874d67` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1248)
   [Finished] [`ee874d67` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1267)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1203740835

   Rebased and some improvements pushed. @jorisvandenbossche Would you like to take a look at the updated docstrings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1181717930

   Again, I think we should refrain from adding some timeout or duration options in an unit other than seconds. Having to juggle between multiple is both cumbersom and a bug magnet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r922523313


##########
python/pyarrow/_s3fs.pyx:
##########
@@ -133,6 +133,21 @@ cdef class S3FileSystem(FileSystem):
         assumed role session will be refreshed.
     region : str, default 'us-east-1'
         AWS region to connect to.
+    request_timeout : double. unit in seconds. default 3. 
+        Socket read timeouts for HTTP clients on Windows. This should be more
+        than adequate for most services. However, if you are transfering large
+        amounts of data or are worried about higher latencies, you should set
+        to something that makes more sense for your use case. For Curl, it's
+        the low speed time, which contains the time in number milliseconds
+        that transfer speed should be below "lowSpeedLimit" for the library
+        to consider it too slow and abort. Note that for Curl this config is
+        converted to seconds by rounding down to the nearest whole second except
+        when the value is greater than 0 and less than 1000ms. In this case it is
+        set to one second. When it's 0, low speed limit check will be disabled.
+        Note that for Windows when this config is 0, the behavior is not specified by Windows.

Review Comment:
   updated



##########
python/pyarrow/_s3fs.pyx:
##########
@@ -133,6 +133,21 @@ cdef class S3FileSystem(FileSystem):
         assumed role session will be refreshed.
     region : str, default 'us-east-1'
         AWS region to connect to.
+    request_timeout : double. unit in seconds. default 3. 
+        Socket read timeouts for HTTP clients on Windows. This should be more

Review Comment:
   updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1185045318

   change the units to seconds. @pitrou can you please rerun your tests on ubuntu?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r910087214


##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options {
   /// the region (environment variables, configuration profile, EC2 metadata
   /// server).
   std::string region;
+  long request_timeout_ms = -1;
+  long connect_timeout_ms = -1;

Review Comment:
   Sorry for the delay. We want to make the API easy to use and passing seconds is the most natural choice. So we'll have to live with a `double`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r913898236


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -701,6 +701,17 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = options_.request_timeout_ms;

Review Comment:
   On Mac, both options are needed. using connectTimeoutMs without requestTimeoutMs leads to a hang when trying to open an input file. using requestTimeoutMs without connectTimeoutMs just leads to time out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r900266915


##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options {
   /// the region (environment variables, configuration profile, EC2 metadata
   /// server).
   std::string region;
+  long request_timeout_ms = -1;
+  long connect_timeout_ms = -1;

Review Comment:
   Changed it to double



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r918425263


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -718,6 +718,14 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = std::lround(options_.request_timeout_ms);
+    }
+    if (options_.connect_timeout_ms != -1) {
+      client_config_.connectTimeoutMs = std::lround(options_.connect_timeout_ms);
+    }

Review Comment:
   ```suggestion
       if (options_.request_timeout_ms >= 0) {
         client_config_.requestTimeoutMs = std::lround(options_.request_timeout_ms);
       }
       if (options_.connect_timeout_ms >= 0) {
         client_config_.connectTimeoutMs = std::lround(options_.connect_timeout_ms);
       }
   ```
   
   We should probably either do `>=0` (suggestion above) or return an error if it is less than `-1`



##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options {
   /// the region (environment variables, configuration profile, EC2 metadata
   /// server).
   std::string region;
+  double request_timeout_ms = -1;
+  double connect_timeout_ms = -1;

Review Comment:
   Can you add comments here?  We should probably explain what the default is (i.e. what happens if you leave this as -1) as well as what 0 means.



##########
python/pyarrow/_s3fs.pyx:
##########
@@ -179,7 +179,7 @@ cdef class S3FileSystem(FileSystem):
         CS3FileSystem* s3fs
 
     def __init__(self, *, access_key=None, secret_key=None, session_token=None,
-                 bint anonymous=False, region=None, scheme=None,
+                 bint anonymous=False, region=None, request_timeout_ms=None, connect_timeout_ms=None, scheme=None,

Review Comment:
   Can you add these fields to the above documentation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1156681638

   How to test this: you can use MinIO and firewall tools on Mac to artificially limit your network connection. 
   1. Setup a MinIO server (default port is localhost:9000), make a bucket and put a simple text file in it. 
   2. Now change your network connection to 9000 to add a 10s delay each way by doing the following:
   ```
   sudo dnctl pipe 1 config plr 0.3 bw 100Kbit/s delay 10000 
    echo "dummynet in  proto tcp from any to localhost port 9000 pipe 1
         dummynet out proto tcp from any to localhost port 9000 pipe 1" | sudo pfctl -f -
   sudo pfctl -e   
   ```
   3. telnet 127.0.0.1 9000 should now establish connection after 20s. 
   4. Now if you don't specify timeout options when making a S3FileSystem in Python, it will timeout. 
   5. However if you specify the following options: s3 = fs.S3FileSystem(access_key="minioadmin",secret_key="minioadmin",endpoint_override='127.0.0.1:9000',scheme="http",connect_timeout_ms=100000, request_timeout_ms=100000) it will work!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1156679758

   https://issues.apache.org/jira/browse/ARROW-16521


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r898942581


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -701,6 +701,17 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = options_.request_timeout_ms;

Review Comment:
   According to the doc, `requestTimeoutMs` is Windows-specific? Ideally this should work on all platforms.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r921911318


##########
python/pyarrow/_s3fs.pyx:
##########
@@ -133,6 +133,21 @@ cdef class S3FileSystem(FileSystem):
         assumed role session will be refreshed.
     region : str, default 'us-east-1'
         AWS region to connect to.
+    request_timeout : double. unit in seconds. default 3. 
+        Socket read timeouts for HTTP clients on Windows. This should be more
+        than adequate for most services. However, if you are transfering large
+        amounts of data or are worried about higher latencies, you should set
+        to something that makes more sense for your use case. For Curl, it's
+        the low speed time, which contains the time in number milliseconds
+        that transfer speed should be below "lowSpeedLimit" for the library
+        to consider it too slow and abort. Note that for Curl this config is
+        converted to seconds by rounding down to the nearest whole second except
+        when the value is greater than 0 and less than 1000ms. In this case it is
+        set to one second. When it's 0, low speed limit check will be disabled.
+        Note that for Windows when this config is 0, the behavior is not specified by Windows.

Review Comment:
   Are those details about curl relevant enough to include here? (especially if we already use seconds, the details on how milliseconds get rounded are not that important?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r929399076


##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -103,6 +103,12 @@ struct ARROW_EXPORT S3Options {
   /// server).
   std::string region;
 
+  /// if unset (or any value less than 0), the AWS SDK default values will be used.
+  /// request_timeout default 3 seconds
+  /// connect_timeout default 1 second
+  double request_timeout = -1;
+  double connect_timeout = -1;

Review Comment:
   ```suggestion
     /// Socket read timeout, the default is 3 seconds.
     /// If less than zero then this property is ignored and the default is used.
     /// If zero then ???
     double request_timeout = -1;
     /// Socket connect timeout, the default is 1 second.
     /// If less than zero then this property is ignored and the default is used.
     /// If zero then ???
     double connect_timeout = -1;
   ```
   
   I don't think we need to give as much detail as we do in python (although maybe we should) but we should be unambiguous.  Note that this comment is used to automatically generate this API doc: https://arrow.apache.org/docs/cpp/api/filesystem.html#_CPPv4N5arrow2fs9S3OptionsE
   
   So even though there is a bit of repetition it helps to have a comment for each field and not to bunch them up like this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r910978555


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -701,6 +701,17 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = options_.request_timeout_ms;

Review Comment:
   Ok, I tried on Linux. `connectTimeoutMs` seems to work there but `requestTimeoutMs` is ignored. Which setting do you need?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][Python] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r914301758


##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options {
   /// the region (environment variables, configuration profile, EC2 metadata
   /// server).
   std::string region;
+  long request_timeout_ms = -1;
+  long connect_timeout_ms = -1;

Review Comment:
   addressed this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
marsupialtail commented on code in PR #13385:
URL: https://github.com/apache/arrow/pull/13385#discussion_r899297368


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -701,6 +701,17 @@ class ClientBuilder {
     if (!options_.region.empty()) {
       client_config_.region = ToAwsString(options_.region);
     }
+    if (options_.request_timeout_ms != -1) {
+      client_config_.requestTimeoutMs = options_.request_timeout_ms;

Review Comment:
   Yes you can refer to my comment above. On linux I think you can use netem. 
   https://stackoverflow.com/questions/10481032/bandwidth-throttling-using-netem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13385:
URL: https://github.com/apache/arrow/pull/13385#issuecomment-1156679787

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org