You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "kindly (via GitHub)" <gi...@apache.org> on 2023/04/24 20:37:53 UTC

[GitHub] [arrow-rs] kindly opened a new pull request, #4120: Retry on all request errors.

kindly opened a new pull request, #4120:
URL: https://github.com/apache/arrow-rs/pull/4120

   Request errors can happen intermittently for a variety of reasons i.e patchy internet, hyper holding on to request for too long, TLS errors, broken pipes.
   
   This increases the types of issues that can be retried.
   
   # Which issue does this PR close?
   
   Closes #4119 
   
   # Rationale for this change
    
   Only certain request errors are retried currently, and I was experiencing very unreliable multipart uploads, because if a single chunk failed the whole upload would fail.  Mulitpart uploads caused more issues due to sometimes being a delay between put request for each chunk and hyper, sometimes retrying a dead connection or having network issues.
   
   # What changes are included in this PR?
   
   Allow retries when reqwest fails due to request errors. 
   
   # Are there any user-facing changes?
   
   Fewer failures when getting or uploading data to object stores.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] kindly commented on pull request #4120: Retry on all request errors.

Posted by "kindly (via GitHub)" <gi...@apache.org>.
kindly commented on PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#issuecomment-1522302614

   @tustvold I have had difficulty making a useful test case for this.  The `MockServer` currently works on the level of a Response object, and this will not have one, only a `hyper` error.  I have not been able to find a way to make the mock server misbehave on purpose, like prematurely close a connection.  I think I would need to get the raw socket, but even with this I am not sure how realistic a test case I could make.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4120: Retry on all request errors.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#issuecomment-1522462215

   >  I have not been able to find a way to make the mock server to misbehave on purpose, like prematurely close a connection
   
   What happens if you `MockServer::push_fn` a function that panics, I _think_ this should drop the connection without returning a response?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4120: Retry on Connection Errors

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4120: Retry on all request errors.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#discussion_r1176373372


##########
object_store/src/client/retry.rs:
##########
@@ -192,11 +192,20 @@ impl RetryExt for reqwest::RequestBuilder {
                     },
                     Err(e) =>
                     {
-                        return Err(Error{
-                            retries,
-                            message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                        if retries == max_retries
+                            || now.elapsed() > retry_timeout
+                            || !e.is_request() {

Review Comment:
   Unless I am mistaken this will retry request and connection timeouts, along with malformed responses from the server, is this desirable?
   
   Perhaps we could do something like
   
   ```
   if let Some(e) = e.source().downcast_ref::<hyper::Error>() {
       if e.is_connect() || e.is_closed() {
           // Do retry
       }
   }
   ```
   
   This would allow limiting the potential blast radius of this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] kindly commented on a diff in pull request #4120: Retry on all request errors.

Posted by "kindly (via GitHub)" <gi...@apache.org>.
kindly commented on code in PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#discussion_r1176918637


##########
object_store/src/client/retry.rs:
##########
@@ -192,11 +192,20 @@ impl RetryExt for reqwest::RequestBuilder {
                     },
                     Err(e) =>
                     {
-                        return Err(Error{
-                            retries,
-                            message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                        if retries == max_retries
+                            || now.elapsed() > retry_timeout
+                            || !e.is_request() {

Review Comment:
   @tustvold 
   I have basically done what you have said here.  I added another condition `is_incomplete_message()` which was coming up for me when loading a large file, not on a stable connection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on a diff in pull request #4120: Retry on all request errors.

Posted by "crepererum (via GitHub)" <gi...@apache.org>.
crepererum commented on code in PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#discussion_r1176456377


##########
object_store/src/client/retry.rs:
##########
@@ -192,11 +192,20 @@ impl RetryExt for reqwest::RequestBuilder {
                     },
                     Err(e) =>
                     {
-                        return Err(Error{
-                            retries,
-                            message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                        if retries == max_retries
+                            || now.elapsed() > retry_timeout
+                            || !e.is_request() {

Review Comment:
   I agree that only server side (5xx) and transport errors should be retried, not all error codes. Esp. client side errors (4xx) should NOT be blindly retried.
   
   So I guess the system could be: if it is an HTTP error code, filter the number range to 5xx. For all other errors use the flags to figure out their nature (`is_connect`/`is_closed` etc.).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] kindly commented on pull request #4120: Retry on all request errors.

Posted by "kindly (via GitHub)" <gi...@apache.org>.
kindly commented on PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#issuecomment-1522919547

   @tustvold yes, that did work.  
   It resulted in an `incomplete message` error on the client that is now caught and retried.
   
   Added tests where the retry succeeded, and when it failed after max retries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] kindly commented on a diff in pull request #4120: Retry on all request errors.

Posted by "kindly (via GitHub)" <gi...@apache.org>.
kindly commented on code in PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#discussion_r1176918637


##########
object_store/src/client/retry.rs:
##########
@@ -192,11 +192,20 @@ impl RetryExt for reqwest::RequestBuilder {
                     },
                     Err(e) =>
                     {
-                        return Err(Error{
-                            retries,
-                            message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                        if retries == max_retries
+                            || now.elapsed() > retry_timeout
+                            || !e.is_request() {

Review Comment:
   @tustvold 
   I have basically done what you have said here.  I added another condition `is_incomplete_message()` which was coming up for me when loading a large file, on an unstable connection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on a diff in pull request #4120: Retry on all request errors.

Posted by "crepererum (via GitHub)" <gi...@apache.org>.
crepererum commented on code in PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#discussion_r1176456377


##########
object_store/src/client/retry.rs:
##########
@@ -192,11 +192,20 @@ impl RetryExt for reqwest::RequestBuilder {
                     },
                     Err(e) =>
                     {
-                        return Err(Error{
-                            retries,
-                            message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                        if retries == max_retries
+                            || now.elapsed() > retry_timeout
+                            || !e.is_request() {

Review Comment:
   I agree that only server side (5xx) and transport errors should be retried, not all error codes. Esp. client side errors (4xx) should NOT be blindly retried.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] kindly commented on a diff in pull request #4120: Retry on all request errors.

Posted by "kindly (via GitHub)" <gi...@apache.org>.
kindly commented on code in PR #4120:
URL: https://github.com/apache/arrow-rs/pull/4120#discussion_r1176924910


##########
object_store/src/client/retry.rs:
##########
@@ -192,11 +192,20 @@ impl RetryExt for reqwest::RequestBuilder {
                     },
                     Err(e) =>
                     {
-                        return Err(Error{
-                            retries,
-                            message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                        if retries == max_retries
+                            || now.elapsed() > retry_timeout
+                            || !e.is_request() {

Review Comment:
   @crepererum
   These retries only cover cases where there is no response at all, therefore no response codes at all.  The retry condition for requests with responses is `is_server_error` which I assume means 5xx errors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org