You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Francois Saint-Jacques (Jira)" <ji...@apache.org> on 2020/05/21 18:11:00 UTC

[jira] [Created] (ARROW-8884) [C++] Listing files with S3FileSystem is slow

Francois Saint-Jacques created ARROW-8884:
---------------------------------------------

             Summary: [C++] Listing files with S3FileSystem is slow
                 Key: ARROW-8884
                 URL: https://issues.apache.org/jira/browse/ARROW-8884
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Francois Saint-Jacques


Listing files on S3 is slow due to the recursive nature of the algorithm.

The following change modifies the behavior of the S3Result to include all objects but no "grouping" (directories). This lower dramatically the number of HTTP calls. 
{code:c++}
diff --git a/cpp/src/arrow/filesystem/s3fs.cc b/cpp/src/arrow/filesystem/s3fs.cc
index 70c87f46ec..98a40b17a2 100644
--- a/cpp/src/arrow/filesystem/s3fs.cc
+++ b/cpp/src/arrow/filesystem/s3fs.cc
@@ -986,7 +986,7 @@ class S3FileSystem::Impl {
     if (!prefix.empty()) {
       req.SetPrefix(ToAwsString(prefix) + kSep);
     }
-    req.SetDelimiter(Aws::String() + kSep);
+    // req.SetDelimiter(Aws::String() + kSep);
     req.SetMaxKeys(kListObjectsMaxKeys);
 
     while (true) {

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)