You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/04/12 02:23:13 UTC

[GitHub] [arrow-rs] wjones127 opened a new pull request, #4060: feat: support bulk deletes in object_store

wjones127 opened a new pull request, #4060:
URL: https://github.com/apache/arrow-rs/pull/4060

   # Which issue does this PR close?
   
   Closes #2615.
   
   # Rationale for this change
    
   Provides methods for quickly deleting large numbers of objects, such as when dropping a Parquet table.
   
   # What changes are included in this PR?
   
   Introduces two new methods on `ObjectStore`, each with a default implementation. One provides the bulk deletion method. Another provides the number of objects that can be deleted in one underlying call. The latter can be used if the user wants to control the parallelism themselves or if they want to implement progress tracking.
   
   # Are there any user-facing changes?
   
   Adds new APIs, with inline documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201519738


##########
object_store/src/util.rs:
##########
@@ -170,6 +178,22 @@ fn merge_ranges(
     ret
 }
 
+/// Common implementation for delete_all
+#[allow(dead_code)]
+pub(crate) fn delete_all_helper<'a>(

Review Comment:
   TBH, I think the helper turned out not to be necessary, so I removed it entirely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201366742


##########
object_store/src/lib.rs:
##########
@@ -1119,6 +1189,59 @@ mod tests {
             assert_eq!(actual, expected, "{prefix:?} - {offset:?}");
         }
 
+        // Test bulk delete
+        let paths = vec![
+            Path::from("a/a.file"),
+            Path::from("a/a/b.file"),
+            Path::from("aa/a.file"),
+            Path::from("ab/a.file"),
+            Path::from("a/😀.file"),
+        ];
+
+        let out_paths = storage
+            .delete_stream(futures::stream::iter(paths.clone()).boxed())
+            .buffered(5)
+            .map_ok(futures::stream::iter)
+            .try_flatten()
+            .try_collect::<Vec<_>>()
+            .await
+            .unwrap();
+
+        for path in &paths {
+            let err = storage.head(path).await.unwrap_err();
+            assert!(matches!(err, crate::Error::NotFound { .. }), "{}", err);
+        }
+
+        // Some object stores return results out of order (for example, S3)

Review Comment:
   That's fair. It will also make it easier to write a test with mixed success and failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201520153


##########
object_store/src/aws/client.rs:
##########
@@ -243,6 +304,83 @@ impl S3Client {
         Ok(())
     }
 
+    /// Make an S3 Delete Objects request <https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html>
+    ///
+    /// Produces a vector of results, one for each path in the input vector. If
+    /// the delete was successful, the path is returned in the `Ok` variant. If
+    /// there was an error for a certain path, the error will be returned in the
+    /// vector. If there was an issue with making the overall request, an error
+    /// will be returned at the top level.
+    pub async fn bulk_delete_request(
+        &self,
+        paths: Vec<Path>,
+    ) -> Result<Vec<Result<Path>>> {
+        if paths.is_empty() {
+            return Ok(Vec::new());
+        }
+
+        let credential = self.get_credential().await?;
+        let url = format!("{}?delete", self.config.bucket_endpoint);
+
+        let inner_body = paths
+            .iter()
+            .map(|path| format!("<Object><Key>{}</Key></Object>", path))

Review Comment:
   I've added a test with characters special to XML



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1202512271


##########
object_store/src/aws/mod.rs:
##########
@@ -252,11 +252,13 @@ impl ObjectStore for AmazonS3 {
 
     fn delete_stream<'a>(
         &'a self,
-        locations: BoxStream<'a, Path>,
+        locations: BoxStream<'a, Result<Path>>,
     ) -> BoxStream<'a, Result<Path>> {
         locations
             .chunks(1_000)
             .map(move |locations| async {
+                let locations: Vec<Path> =
+                    locations.into_iter().collect::<Result<_>>()?;

Review Comment:
   I think it would be better to use try_chunks, as it will short-circuit on the first error



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1200244730


##########
object_store/src/util.rs:
##########
@@ -170,6 +178,22 @@ fn merge_ranges(
     ret
 }
 
+/// Common implementation for delete_all
+#[allow(dead_code)]
+pub(crate) fn delete_all_helper<'a>(
+    locations: BoxStream<'a, Path>,
+    chunk_size: usize,
+    request_handler: impl Fn(

Review Comment:
   A generic instead of an impl might be easier to read



##########
object_store/src/lib.rs:
##########
@@ -1119,6 +1189,59 @@ mod tests {
             assert_eq!(actual, expected, "{prefix:?} - {offset:?}");
         }
 
+        // Test bulk delete
+        let paths = vec![
+            Path::from("a/a.file"),
+            Path::from("a/a/b.file"),
+            Path::from("aa/a.file"),
+            Path::from("ab/a.file"),
+            Path::from("a/😀.file"),
+        ];
+
+        let out_paths = storage
+            .delete_stream(futures::stream::iter(paths.clone()).boxed())
+            .buffered(5)
+            .map_ok(futures::stream::iter)
+            .try_flatten()
+            .try_collect::<Vec<_>>()
+            .await
+            .unwrap();
+
+        for path in &paths {
+            let err = storage.head(path).await.unwrap_err();
+            assert!(matches!(err, crate::Error::NotFound { .. }), "{}", err);
+        }
+
+        // Some object stores return results out of order (for example, S3)

Review Comment:
   Sorting the output instead of collecting into a HashSet might make for easier to read error messages perhaps



##########
object_store/src/aws/mod.rs:
##########
@@ -249,6 +251,15 @@ impl ObjectStore for AmazonS3 {
         self.client.delete_request(location, &()).await
     }
 
+    fn delete_stream<'a>(
+        &'a self,
+        locations: BoxStream<'a, Path>,
+    ) -> BoxStream<'a, BoxFuture<'a, Result<Vec<Result<Path>>>>> {

Review Comment:
   This signature whilst it does provide the maximum flexibility to the upstreams, is kind of obnoxious to use
   
   What do you think of returning `BoxStream<'a, Result<Path>>` and letting the individual stores control the concurrency, much like we do for coalesce_ranges? 



##########
object_store/src/aws/client.rs:
##########
@@ -129,6 +153,43 @@ struct MultipartPart {
     part_number: usize,
 }
 
+#[derive(Deserialize)]
+#[serde(rename_all = "PascalCase", rename = "DeleteResult")]
+struct BatchDeleteResponse {

Review Comment:
   Given we seem to have enabled a feature to get this to deserialize correctly, could we perhaps get a basic test of this deserialization logic - i.e. given a payload with mixed success and failures, it deserialized correctly?



##########
object_store/src/aws/client.rs:
##########
@@ -243,6 +304,83 @@ impl S3Client {
         Ok(())
     }
 
+    /// Make an S3 Delete Objects request <https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html>
+    ///
+    /// Produces a vector of results, one for each path in the input vector. If
+    /// the delete was successful, the path is returned in the `Ok` variant. If
+    /// there was an error for a certain path, the error will be returned in the
+    /// vector. If there was an issue with making the overall request, an error
+    /// will be returned at the top level.
+    pub async fn bulk_delete_request(
+        &self,
+        paths: Vec<Path>,
+    ) -> Result<Vec<Result<Path>>> {
+        if paths.is_empty() {
+            return Ok(Vec::new());
+        }
+
+        let credential = self.get_credential().await?;
+        let url = format!("{}?delete", self.config.bucket_endpoint);
+
+        let inner_body = paths
+            .iter()
+            .map(|path| format!("<Object><Key>{}</Key></Object>", path))

Review Comment:
   I think this might run into weirdness due to XML escaping, could we use quick-xml to serialize this payload instead of string formatting?
   
   I think Path allows '&' for example, a test of this would be superb...



##########
object_store/src/util.rs:
##########
@@ -170,6 +178,22 @@ fn merge_ranges(
     ret
 }
 
+/// Common implementation for delete_all
+#[allow(dead_code)]
+pub(crate) fn delete_all_helper<'a>(

Review Comment:
   I wonder if we could follow the pattern in #4220 of using extension traits for this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1202501004


##########
object_store/src/lib.rs:
##########
@@ -1472,10 +1579,11 @@ mod tests {
 
     async fn delete_fixtures(storage: &DynObjectStore) {
         let paths = flatten_list_stream(storage, None).await.unwrap();
-
-        for f in &paths {
-            storage.delete(f).await.unwrap();
-        }
+        storage
+            .delete_stream(futures::stream::iter(paths).boxed())

Review Comment:
   I mean the alternative would be to just pass in `Vec<Path>`, but I think if we're going to go the path of providing a streaming interface we should at least make it usable. I would be happy with either tbh, `Vec<Path>` does have the advantage of being simpler... Up to you :smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#issuecomment-1519201951

   I think to implement Azure blob store and GCS, we'll need to do work upstream:
   
   https://github.com/seanmonstar/reqwest/issues/1241
   
   They both used the `mixed/multipart` request type and to do proper error handling we'll want to have robust implementations of it. So I'm going to wait on those two.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1163643919


##########
object_store/src/lib.rs:
##########
@@ -367,6 +367,31 @@ pub trait ObjectStore: std::fmt::Display + Send + Sync + Debug + 'static {
     /// Delete the object at the specified location.
     async fn delete(&self, location: &Path) -> Result<()>;
 
+    /// Delete all the objects at the specified locations
+    ///
+    /// When supported, this method will use bulk operations that delete more
+    /// than one object at a time. The maximum number of objects deleted per call
+    /// is provided by [`DELETE_BATCH_SIZE`]. You can use that parameter to wrap
+    /// this method to customize the parallelism or provide a progress indicator.
+    ///
+    /// This method may create multiple threads to perform the deletions in parallel.
+    async fn delete_all(&self, locations: Vec<Path>) -> Result<()> {

Review Comment:
   What do you think about making this instead something like
   
   ```suggestion
       async fn delete_all(&self, locations: Vec<Path>) -> Result<BoxStream<'_, Result<()>>>  {
   ```
   
   This would allow for granular error and progress reporting? One could even go so far as to make the input also a `BoxStream` :thinking: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201520269


##########
object_store/src/aws/client.rs:
##########
@@ -129,6 +153,43 @@ struct MultipartPart {
     part_number: usize,
 }
 
+#[derive(Deserialize)]
+#[serde(rename_all = "PascalCase", rename = "DeleteResult")]
+struct BatchDeleteResponse {

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201937598


##########
object_store/src/limit.rs:
##########
@@ -148,6 +148,14 @@ impl<T: ObjectStore> ObjectStore for LimitStore<T> {
         self.inner.delete(location).await
     }
 
+    fn delete_stream<'a>(
+        &'a self,
+        locations: BoxStream<'a, Path>,
+    ) -> BoxStream<'a, Result<Path>> {
+        // TODO: how do I implement limit stream with this API?

Review Comment:
   ```suggestion
           let _permit = self.semaphore.acquire().await.unwrap();
   ```
   Whilst not 100% accurate, I think it is reasonable and it what we do for get_ranges



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1202512858


##########
object_store/src/lib.rs:
##########
@@ -1578,9 +1578,15 @@ mod tests {
     }
 
     async fn delete_fixtures(storage: &DynObjectStore) {
-        let paths = flatten_list_stream(storage, None).await.unwrap();
+        // let paths = flatten_list_stream(storage, None).await.unwrap();

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1202491273


##########
object_store/src/lib.rs:
##########
@@ -1472,10 +1579,11 @@ mod tests {
 
     async fn delete_fixtures(storage: &DynObjectStore) {
         let paths = flatten_list_stream(storage, None).await.unwrap();
-
-        for f in &paths {
-            storage.delete(f).await.unwrap();
-        }
+        storage
+            .delete_stream(futures::stream::iter(paths).boxed())

Review Comment:
   It makes the API a little funky but that might be fine. In most of my downstream use cases, I'm probably passing in a `Vec` or iterator, so I'm already going to wrap in `future::stream::iter`. Having to add a `.map(Ok)` seems fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201933491


##########
object_store/src/aws/mod.rs:
##########
@@ -249,6 +251,15 @@ impl ObjectStore for AmazonS3 {
         self.client.delete_request(location, &()).await
     }
 
+    fn delete_stream<'a>(
+        &'a self,
+        locations: BoxStream<'a, Path>,
+    ) -> BoxStream<'a, BoxFuture<'a, Result<Vec<Result<Path>>>>> {

Review Comment:
   Neither signature would really let you do this meaningfully, as neither exposes the granularity of the requests. I personally think this is fine, I suspect we may add request limiting to ClientOptions eventually to handle this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#issuecomment-1563332519

   I'll do some final clean ups later today, then it should be ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #4060: feat: support bulk deletes in object_store

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#issuecomment-1563314543

   Shall we merge it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201939523


##########
object_store/src/lib.rs:
##########
@@ -1472,10 +1579,11 @@ mod tests {
 
     async fn delete_fixtures(storage: &DynObjectStore) {
         let paths = flatten_list_stream(storage, None).await.unwrap();
-
-        for f in &paths {
-            storage.delete(f).await.unwrap();
-        }
+        storage
+            .delete_stream(futures::stream::iter(paths).boxed())

Review Comment:
   Thinking about this use-case, I wonder if the input should be a fallible stream, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201939523


##########
object_store/src/lib.rs:
##########
@@ -1472,10 +1579,11 @@ mod tests {
 
     async fn delete_fixtures(storage: &DynObjectStore) {
         let paths = flatten_list_stream(storage, None).await.unwrap();
-
-        for f in &paths {
-            storage.delete(f).await.unwrap();
-        }
+        storage
+            .delete_stream(futures::stream::iter(paths).boxed())

Review Comment:
   Thinking about this use-case, I wonder if the input should be a fallible stream, what do you think? You could then feed the output of the list operation directly into the bulk delete API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#issuecomment-1556071058

   ✅ I've run the integration tests against AWS S3 and Cloudflare R2. Both pass 👍  (although Cloudflare fails the `get_opts` test right now).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1200251376


##########
object_store/src/aws/mod.rs:
##########
@@ -249,6 +251,15 @@ impl ObjectStore for AmazonS3 {
         self.client.delete_request(location, &()).await
     }
 
+    fn delete_stream<'a>(
+        &'a self,
+        locations: BoxStream<'a, Path>,
+    ) -> BoxStream<'a, BoxFuture<'a, Result<Vec<Result<Path>>>>> {

Review Comment:
   This signature whilst it does provide the maximum flexibility to the upstreams, is kind of obnoxious to use
   
   What do you think of returning `BoxStream<'a, Result<Path>>` and letting the individual stores control the concurrency, much like we do for coalesce_ranges? This would have the advantage of letting them choose an appropriate value
   
   It would also avoid overheads for stores that don't support bulk deletes, FWIW



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1202501004


##########
object_store/src/lib.rs:
##########
@@ -1472,10 +1579,11 @@ mod tests {
 
     async fn delete_fixtures(storage: &DynObjectStore) {
         let paths = flatten_list_stream(storage, None).await.unwrap();
-
-        for f in &paths {
-            storage.delete(f).await.unwrap();
-        }
+        storage
+            .delete_stream(futures::stream::iter(paths).boxed())

Review Comment:
   I mean the alternative would be to just pass in `Vec<Path>`, but I think if we're going to go the path of providing a streaming interface we should at least make it usable. I would be happy with either tbh, `Vec<Path>` does have the advantage of being simpler...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] wjones127 commented on a diff in pull request #4060: feat: support bulk deletes in object_store

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #4060:
URL: https://github.com/apache/arrow-rs/pull/4060#discussion_r1201519452


##########
object_store/src/aws/mod.rs:
##########
@@ -249,6 +251,15 @@ impl ObjectStore for AmazonS3 {
         self.client.delete_request(location, &()).await
     }
 
+    fn delete_stream<'a>(
+        &'a self,
+        locations: BoxStream<'a, Path>,
+    ) -> BoxStream<'a, BoxFuture<'a, Result<Vec<Result<Path>>>>> {

Review Comment:
   I think this signature is fine, but how would I implement limit store with it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org