You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/02/14 10:55:11 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3714: Filter exact list prefix matches for azure gen2 accounts

tustvold commented on code in PR #3714:
URL: https://github.com/apache/arrow-rs/pull/3714#discussion_r1105623095


##########
object_store/src/azure/client.rs:
##########
@@ -419,33 +419,37 @@ struct ListResultInternal {
     pub blobs: Blobs,
 }
 
-impl TryFrom<ListResultInternal> for ListResult {
-    type Error = crate::Error;
-
-    fn try_from(value: ListResultInternal) -> Result<Self> {
-        let common_prefixes = value
-            .blobs
-            .blob_prefix
-            .into_iter()
-            .map(|x| Ok(Path::parse(x.name)?))
-            .collect::<Result<_>>()?;
-
-        let objects = value
-            .blobs
-            .blobs
-            .into_iter()
-            .map(ObjectMeta::try_from)
-            // Note: workaround for gen2 accounts with hierarchical namespaces. These accounts also
-            // return path segments as "directories". When we cant directories, its always via
-            // the BlobPrefix mechanics.
-            .filter_map_ok(|obj| if obj.size > 0 { Some(obj) } else { None })
-            .collect::<Result<_>>()?;
-
-        Ok(Self {
-            common_prefixes,
-            objects,
+fn to_list_result(value: ListResultInternal, prefix: Option<&str>) -> Result<ListResult> {
+    let prefix = prefix.map(Path::from).unwrap_or_else(Path::default);
+    let common_prefixes = value
+        .blobs
+        .blob_prefix
+        .into_iter()
+        .map(|x| Ok(Path::parse(x.name)?))
+        .collect::<Result<_>>()?;
+
+    let objects = value
+        .blobs
+        .blobs
+        .into_iter()
+        .map(ObjectMeta::try_from)
+        // Note: workaround for gen2 accounts with hierarchical namespaces. These accounts also
+        // return path segments as "directories" and include blobs in list requests with prefix,
+        // if the prefix mateches the blob. When we want directories, its always via
+        // the BlobPrefix mechanics, and during lists we state that prefixes are evaluated on path segement basis.
+        .filter_map_ok(|obj| {
+            if obj.size > 0 && obj.location != prefix {

Review Comment:
   ```suggestion
               if obj.size > 0 && obj.location.as_ref().len() > prefix.as_ref().len() {
   ```
   Should be cheaper and equivalent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org