You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/15 22:17:34 UTC

[GitHub] [arrow] coryan opened a new pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

coryan opened a new pull request #11969:
URL: https://github.com/apache/arrow/pull/11969


   Recursive listing without limit is about as expensive as only listing
   the top-level directories in GCS. Therefore, it is just *more* efficient
   to filter the results on the client-side, as this requires fewer request
   than listing only only 0, 1, or N levels in a directory hierarchy.
   
   I also improved the tests to verify no objects with similar prefixes are
   returned, for example, when listing objects starting with 'aaa' we do
   not want 'aaab', but we want 'aaa/b'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#issuecomment-997967341


   Benchmark runs are scheduled for baseline = e7c2ead855364e74a39154239e547582a4f2754b and contender = 61745beabf1974a729687011aeabe17fdaedb354. 61745beabf1974a729687011aeabe17fdaedb354 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/a74e22b500b248059d1ce135a29fa6f6...2f4044bda1e24610ab6f978468fb6034/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/153bc9e19aa844159ec3ec88ad0ed41c...47392345bb774762a2382350b00ba1db/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/276efab16ab54f77948da1b1856e4e7d...f67607061d764ad8957c7c3ff2ec67a9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#issuecomment-997967341


   Benchmark runs are scheduled for baseline = e7c2ead855364e74a39154239e547582a4f2754b and contender = 61745beabf1974a729687011aeabe17fdaedb354. 61745beabf1974a729687011aeabe17fdaedb354 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/a74e22b500b248059d1ce135a29fa6f6...2f4044bda1e24610ab6f978468fb6034/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/153bc9e19aa844159ec3ec88ad0ed41c...47392345bb774762a2382350b00ba1db/)
   [Finished :arrow_down:0.31% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/276efab16ab54f77948da1b1856e4e7d...f67607061d764ad8957c7c3ff2ec67a9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #11969:
URL: https://github.com/apache/arrow/pull/11969


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#issuecomment-995257542


   https://issues.apache.org/jira/browse/ARROW-15121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] coryan commented on a change in pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
coryan commented on a change in pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#discussion_r772377899



##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -604,6 +650,7 @@ TEST_F(GcsIntegrationTest, DeleteDirContentsSuccess) {
   arrow::fs::AssertFileInfo(fs.get(), PreexistingBucketPath(), FileType::Directory);
   arrow::fs::AssertFileInfo(fs.get(), PreexistingObjectPath(), FileType::File);
   for (auto const& info : hierarchy.contents) {
+    if (!fs::internal::IsAncestorOf(hierarchy.base_dir, info.path())) continue;

Review comment:
       Fixed, sorry about that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] coryan commented on a change in pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
coryan commented on a change in pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#discussion_r772358153



##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -591,6 +634,7 @@ TEST_F(GcsIntegrationTest, DeleteDirSuccess) {
   arrow::fs::AssertFileInfo(fs.get(), PreexistingBucketPath(), FileType::Directory);
   arrow::fs::AssertFileInfo(fs.get(), PreexistingObjectPath(), FileType::File);
   for (auto const& info : hierarchy.contents) {
+    if (!fs::internal::IsAncestorOf(hierarchy.base_dir, info.path())) continue;

Review comment:
       Done.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] coryan commented on a change in pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
coryan commented on a change in pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#discussion_r772358099



##########
File path: cpp/src/arrow/filesystem/gcsfs_internal.cc
##########
@@ -258,6 +259,10 @@ Result<std::shared_ptr<const KeyValueMetadata>> FromObjectMetadata(
   return result;
 }
 
+std::size_t Depth(arrow::util::string_view path) {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#discussion_r772360449



##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -604,6 +650,7 @@ TEST_F(GcsIntegrationTest, DeleteDirContentsSuccess) {
   arrow::fs::AssertFileInfo(fs.get(), PreexistingBucketPath(), FileType::Directory);
   arrow::fs::AssertFileInfo(fs.get(), PreexistingObjectPath(), FileType::File);
   for (auto const& info : hierarchy.contents) {
+    if (!fs::internal::IsAncestorOf(hierarchy.base_dir, info.path())) continue;

Review comment:
       Can you do the same change as well here? (check the expected type)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#issuecomment-997967341


   Benchmark runs are scheduled for baseline = e7c2ead855364e74a39154239e547582a4f2754b and contender = 61745beabf1974a729687011aeabe17fdaedb354. 61745beabf1974a729687011aeabe17fdaedb354 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/a74e22b500b248059d1ce135a29fa6f6...2f4044bda1e24610ab6f978468fb6034/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/153bc9e19aa844159ec3ec88ad0ed41c...47392345bb774762a2382350b00ba1db/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/276efab16ab54f77948da1b1856e4e7d...f67607061d764ad8957c7c3ff2ec67a9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#discussion_r772306994



##########
File path: cpp/src/arrow/filesystem/gcsfs_internal.cc
##########
@@ -258,6 +259,10 @@ Result<std::shared_ptr<const KeyValueMetadata>> FromObjectMetadata(
   return result;
 }
 
+std::size_t Depth(arrow::util::string_view path) {

Review comment:
       Nit, but as per the [coding guidelines](https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci) (which are copied over from the Google C++ style guide :-)), we generally use explicit-width types such as `int32_t`.

##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -591,6 +634,7 @@ TEST_F(GcsIntegrationTest, DeleteDirSuccess) {
   arrow::fs::AssertFileInfo(fs.get(), PreexistingBucketPath(), FileType::Directory);
   arrow::fs::AssertFileInfo(fs.get(), PreexistingObjectPath(), FileType::File);
   for (auto const& info : hierarchy.contents) {
+    if (!fs::internal::IsAncestorOf(hierarchy.base_dir, info.path())) continue;

Review comment:
       You could actually check that this entry still exists instead of ignoring it.
   (same comment below too)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11969: ARROW-15121: [C++] Implement max recursion on GcsFileSystem

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11969:
URL: https://github.com/apache/arrow/pull/11969#issuecomment-997967341


   Benchmark runs are scheduled for baseline = e7c2ead855364e74a39154239e547582a4f2754b and contender = 61745beabf1974a729687011aeabe17fdaedb354. 61745beabf1974a729687011aeabe17fdaedb354 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/a74e22b500b248059d1ce135a29fa6f6...2f4044bda1e24610ab6f978468fb6034/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/153bc9e19aa844159ec3ec88ad0ed41c...47392345bb774762a2382350b00ba1db/)
   [Finished :arrow_down:0.31% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/276efab16ab54f77948da1b1856e4e7d...f67607061d764ad8957c7c3ff2ec67a9/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org