You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/04/09 16:20:19 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #4040: Add with_tokio_runtime to HTTP stores

tustvold opened a new pull request, #4040:
URL: https://github.com/apache/arrow-rs/pull/4040

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   This allows isolating IO into a separate pool, allowing using object_store outside of a tokio context
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#issuecomment-1724082933

   Looping back to this I think this problem is ill-formed. There are two major use-cases for this functionality:
   
   1. Supporting object_store on threads not managed by tokio
   2. Support object_store in systems containing multiple thread pools with different tail latencies
   
   The first use-case is better served by integrating tokio at a higher level, e.g. using `Handle::enter` at the thread level.
   
   It is unclear how to handle the second use-case at a library level. The use of a second threadpool implies that the primary threadpool may have very high tail latencies. The problem is determining at what point this should result in back pressure on the underlying TCP connection. As written this PR will not change the way that this backpressure occurs, should the task not get scheduled on the high tail latency threadpool, nothing will drain the TCP socket, and TCP backpressure will occur. The approach in https://github.com/apache/arrow-rs/pull/4015 instead uses a queue with capacity for a single chunk, which will delay this TCP backpressure very slightly. You could increase the queue size, or make a more sophisticated queue that buffers a given number of bytes, but you're it is unclear how this should be controlled.
   
   Taking a step back this feels like the wrong way to solve this problem, ultimately IO should be segregated from compute at a meaningful application task boundary, rather than at the object_store interface. For example, AsyncFileReader::get_bytes could perform the IO to fetch a given chunk of data on a separate thread pool. This avoids object_store having to make decisions about how much buffering is too much, etc...
   
   I am therefore going to close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed pull request #4040: Add with_tokio_runtime to HTTP stores
URL: https://github.com/apache/arrow-rs/pull/4040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#discussion_r1161309686


##########
object_store/src/client/retry.rs:
##########
@@ -190,26 +252,36 @@ impl RetryExt for reqwest::RequestBuilder {
                             tokio::time::sleep(sleep).await;
                         }
                     },
-                    Err(e) =>
-                    {
-                        return Err(Error{
+                    Err(e) => {
+                        return Err(Error::Retry(RetryError {
                             retries,
                             message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                            source: Some(e),
+                        }))
                     }
                 }
             }
+        };
+
+        match config.runtime.as_ref() {
+            Some(handle) => handle
+                .spawn(fut)

Review Comment:
   It is worth highlighting that this only spawns the code that generates the Response, the Response streaming can and will take place in the calling context. This is perfectly acceptable as the mio reactor registration will have occurred already, the futures plumbing is runtime agnostic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#discussion_r1161324324


##########
object_store/src/http/mod.rs:
##########
@@ -271,8 +283,16 @@ mod tests {
 
     use super::*;
 
-    #[tokio::test]
-    async fn http_test() {
+    /// Deletes any directories left behind from previous tests
+    async fn cleanup_directories(integration: &HttpStore) {

Review Comment:
   This is necessary because we now run the test twice, and the directories left behind cause tests of list_with_delimiter to fail.
   
   I have confirmed that this behaviour of returning common prefixes for empty directories is consistent with LocalFileSystem. The reason we don't run into this with LocalFileSystem is that it creates a new temp directory for each test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#issuecomment-1507237052

   Marking as a draft whilst I think a bit more on this, another option might be to do something similar to https://docs.rs/async-compat/latest/async_compat/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#discussion_r1161309382


##########
object_store/src/client/retry.rs:
##########
@@ -121,22 +166,37 @@ impl Default for RetryConfig {
     }
 }
 
+/// Crate-private client configuration
+///
+/// Specifically this is the config passed to [`RetryExt::send_retry`]
+///
+/// This is unlike the public [`ClientOptions`](crate::ClientOptions) which contains just
+/// the properties used to construct [`Client`](reqwest::Client)
+#[derive(Debug, Clone, Default)]
+pub struct ClientConfig {

Review Comment:
   The separation of ClientOptions and ClientConfig is perhaps a little derived, but ClientConfig is a crate-private implementation detail and so I think this is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#discussion_r1161309339


##########
object_store/src/client/retry.rs:
##########
@@ -23,17 +23,62 @@ use futures::FutureExt;
 use reqwest::header::LOCATION;
 use reqwest::{Response, StatusCode};
 use std::time::{Duration, Instant};
+use tokio::runtime::Handle;
+use tokio::task::JoinError;
 use tracing::info;
 
+#[derive(Debug)]

Review Comment:
   This module is not public, and so these changes are not breaking



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on a diff in pull request #4040: Add with_tokio_runtime to HTTP stores

Posted by "crepererum (via GitHub)" <gi...@apache.org>.
crepererum commented on code in PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#discussion_r1162541501


##########
object_store/src/client/retry.rs:
##########
@@ -190,26 +252,36 @@ impl RetryExt for reqwest::RequestBuilder {
                             tokio::time::sleep(sleep).await;
                         }
                     },
-                    Err(e) =>
-                    {
-                        return Err(Error{
+                    Err(e) => {
+                        return Err(Error::Retry(RetryError {
                             retries,
                             message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                            source: Some(e),
+                        }))
                     }
                 }
             }
+        };
+
+        match config.runtime.as_ref() {
+            Some(handle) => handle
+                .spawn(fut)

Review Comment:
   I'm not sure I follow your argument here. The underlying socket is registered w/ the IO runtime and so is the mio reactor. However we still cross-poll. So is our assumption that when polling data from the IO runtime to which mio just has written to, mio will never change its mind and "jump" to another runtime?



##########
object_store/src/client/retry.rs:
##########
@@ -190,26 +252,36 @@ impl RetryExt for reqwest::RequestBuilder {
                             tokio::time::sleep(sleep).await;
                         }
                     },
-                    Err(e) =>
-                    {
-                        return Err(Error{
+                    Err(e) => {
+                        return Err(Error::Retry(RetryError {
                             retries,
                             message: "request error".to_string(),
-                            source: Some(e)
-                        })
+                            source: Some(e),
+                        }))
                     }
                 }
             }
+        };
+
+        match config.runtime.as_ref() {
+            Some(handle) => handle
+                .spawn(fut)

Review Comment:
   Might be worth to put your entire comment in code as a code comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org