You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2020/12/02 12:42:00 UTC
[jira] [Commented] (ARROW-8884) [C++] Listing files with
S3FileSystem is slow
[ https://issues.apache.org/jira/browse/ARROW-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242329#comment-17242329 ]
Antoine Pitrou commented on ARROW-8884:
---------------------------------------
Related: ARROW-10788
> [C++] Listing files with S3FileSystem is slow
> ---------------------------------------------
>
> Key: ARROW-8884
> URL: https://issues.apache.org/jira/browse/ARROW-8884
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Francois Saint-Jacques
> Priority: Major
> Labels: filesystem
>
> Listing files on S3 is slow due to the recursive nature of the algorithm.
> The following change modifies the behavior of the S3Result to include all objects but no "grouping" (directories). This lower dramatically the number of HTTP calls.
> {code:c++}
> diff --git a/cpp/src/arrow/filesystem/s3fs.cc b/cpp/src/arrow/filesystem/s3fs.cc
> index 70c87f46ec..98a40b17a2 100644
> --- a/cpp/src/arrow/filesystem/s3fs.cc
> +++ b/cpp/src/arrow/filesystem/s3fs.cc
> @@ -986,7 +986,7 @@ class S3FileSystem::Impl {
> if (!prefix.empty()) {
> req.SetPrefix(ToAwsString(prefix) + kSep);
> }
> - req.SetDelimiter(Aws::String() + kSep);
> + // req.SetDelimiter(Aws::String() + kSep);
> req.SetMaxKeys(kListObjectsMaxKeys);
>
> while (true) {
> {code}
> The suggested change is to add an option to Selector, e.g. `no_directory_result` or something like this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)