You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "roeap (via GitHub)" <gi...@apache.org> on 2023/02/14 06:57:32 UTC

[GitHub] [arrow-rs] roeap opened a new pull request, #3714: Filter exact list prefix matches for azure gen2 accounts

roeap opened a new pull request, #3714:
URL: https://github.com/apache/arrow-rs/pull/3714

   # Which issue does this PR close?
   
   Part of https://github.com/apache/arrow-rs/issues/3712
   
   again, no straight forward way to test this in CI without a gen2 account, but tested it locally.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3714: Filter exact list prefix matches for azure gen2 accounts

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #3714:
URL: https://github.com/apache/arrow-rs/pull/3714#issuecomment-1429524010

   > Would it be possible to cut another object store release soon?
   
   I can look to cut a release once this is in


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] roeap commented on pull request #3714: Filter exact list prefix matches for azure gen2 accounts

Posted by "roeap (via GitHub)" <gi...@apache.org>.
roeap commented on PR #3714:
URL: https://github.com/apache/arrow-rs/pull/3714#issuecomment-1429255411

   I know the latest object store release is just a week old, but there are some fixes and features on master I am keen to integrate. Would it be possible to cut another object store release soon?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] roeap commented on pull request #3714: Filter exact list prefix matches for azure gen2 accounts

Posted by "roeap (via GitHub)" <gi...@apache.org>.
roeap commented on PR #3714:
URL: https://github.com/apache/arrow-rs/pull/3714#issuecomment-1429557569

   I seem to not be able to re-start the failing tests, but to me it seems is just a bit of flakiness in connecting to localstack.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #3714: Filter exact list prefix matches for azure gen2 accounts

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #3714:
URL: https://github.com/apache/arrow-rs/pull/3714#issuecomment-1429649885

   Benchmark runs are scheduled for baseline = 38a79ae4e4bff70b3d74f7582f9c4f4dbff62b69 and contender = ef00365eeffa0af5cbbb2e44ac219e2c0c384fa2. ef00365eeffa0af5cbbb2e44ac219e2c0c384fa2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/5389ab49908c4144b4ec1e118637d500...bfe2d3a70fe94752814f2732cf32789d/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/e72a46c5cb86469c8b14c2c635d2911b...b8c9b31e7a0946f98e9f7792d97885d8/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd0d838194b84e2e8bab419587cf6ee5...f37307b899df42e7a52a83ec31c668a0/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/aa0f140c6bd84fafb0d0ac99080abad4...9358bd7595b94f248f9cbe0760050acc/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3714: Filter exact list prefix matches for azure gen2 accounts

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3714:
URL: https://github.com/apache/arrow-rs/pull/3714#discussion_r1105623095


##########
object_store/src/azure/client.rs:
##########
@@ -419,33 +419,37 @@ struct ListResultInternal {
     pub blobs: Blobs,
 }
 
-impl TryFrom<ListResultInternal> for ListResult {
-    type Error = crate::Error;
-
-    fn try_from(value: ListResultInternal) -> Result<Self> {
-        let common_prefixes = value
-            .blobs
-            .blob_prefix
-            .into_iter()
-            .map(|x| Ok(Path::parse(x.name)?))
-            .collect::<Result<_>>()?;
-
-        let objects = value
-            .blobs
-            .blobs
-            .into_iter()
-            .map(ObjectMeta::try_from)
-            // Note: workaround for gen2 accounts with hierarchical namespaces. These accounts also
-            // return path segments as "directories". When we cant directories, its always via
-            // the BlobPrefix mechanics.
-            .filter_map_ok(|obj| if obj.size > 0 { Some(obj) } else { None })
-            .collect::<Result<_>>()?;
-
-        Ok(Self {
-            common_prefixes,
-            objects,
+fn to_list_result(value: ListResultInternal, prefix: Option<&str>) -> Result<ListResult> {
+    let prefix = prefix.map(Path::from).unwrap_or_else(Path::default);
+    let common_prefixes = value
+        .blobs
+        .blob_prefix
+        .into_iter()
+        .map(|x| Ok(Path::parse(x.name)?))
+        .collect::<Result<_>>()?;
+
+    let objects = value
+        .blobs
+        .blobs
+        .into_iter()
+        .map(ObjectMeta::try_from)
+        // Note: workaround for gen2 accounts with hierarchical namespaces. These accounts also
+        // return path segments as "directories" and include blobs in list requests with prefix,
+        // if the prefix mateches the blob. When we want directories, its always via
+        // the BlobPrefix mechanics, and during lists we state that prefixes are evaluated on path segement basis.
+        .filter_map_ok(|obj| {
+            if obj.size > 0 && obj.location != prefix {

Review Comment:
   ```suggestion
               if obj.size > 0 && obj.location.as_ref().len() > prefix.as_ref().len() {
   ```
   Should be cheaper and equivalent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3714: Filter exact list prefix matches for azure gen2 accounts

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3714:
URL: https://github.com/apache/arrow-rs/pull/3714


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org