You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2020/12/02 12:42:00 UTC

[jira] [Commented] (ARROW-8884) [C++] Listing files with S3FileSystem is slow

    [ https://issues.apache.org/jira/browse/ARROW-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242329#comment-17242329 ] 

Antoine Pitrou commented on ARROW-8884:
---------------------------------------

Related: ARROW-10788

> [C++] Listing files with S3FileSystem is slow
> ---------------------------------------------
>
>                 Key: ARROW-8884
>                 URL: https://issues.apache.org/jira/browse/ARROW-8884
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Francois Saint-Jacques
>            Priority: Major
>              Labels: filesystem
>
> Listing files on S3 is slow due to the recursive nature of the algorithm.
> The following change modifies the behavior of the S3Result to include all objects but no "grouping" (directories). This lower dramatically the number of HTTP calls. 
> {code:c++}
> diff --git a/cpp/src/arrow/filesystem/s3fs.cc b/cpp/src/arrow/filesystem/s3fs.cc
> index 70c87f46ec..98a40b17a2 100644
> --- a/cpp/src/arrow/filesystem/s3fs.cc
> +++ b/cpp/src/arrow/filesystem/s3fs.cc
> @@ -986,7 +986,7 @@ class S3FileSystem::Impl {
>      if (!prefix.empty()) {
>        req.SetPrefix(ToAwsString(prefix) + kSep);
>      }
> -    req.SetDelimiter(Aws::String() + kSep);
> +    // req.SetDelimiter(Aws::String() + kSep);
>      req.SetMaxKeys(kListObjectsMaxKeys);
>  
>      while (true) {
> {code}
> The suggested change is to add an option to Selector, e.g. `no_directory_result` or something like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)