You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Gabor Bota (Jira)" <ji...@apache.org> on 2020/01/30 10:21:00 UTC

[jira] [Resolved] (HADOOP-16801) S3Guard queries S3 with recursive file listings

     [ https://issues.apache.org/jira/browse/HADOOP-16801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabor Bota resolved HADOOP-16801.
---------------------------------
    Resolution: Fixed

> S3Guard queries S3 with recursive file listings
> -----------------------------------------------
>
>                 Key: HADOOP-16801
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16801
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Minor
>         Attachments: HADOOP-aws-no-prefetch.prelim.patch
>
>
> S3Guard does not respect authoritative metadatastore when listFiles is used with recursive=true. It queries S3 even when given directory tree is 1-level with no nested directories and the parent directory listing is authoritative. S3Guard should check the listings in given directory tree for authoritativeness and not query S3 when all listings in the tree are marked as authoritative in metadata table (given metadatastore is configured to be authoritative.
> Below is the description of how the current code works:
> S3AFileSystem#listFiles with recursive option, queries S3 even when directory listing is authoritative. FileStatusListingIterator is created with given entries from metadata store [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L126] . However, FileStatusListingIterator has an ObjectListingIterator that prefetches from s3 regardless of authoritative listing. We observed this behavior when using DynamDBMetadataStore.
> I suppressed the unnecessary S3 calls by providing a dumb listing iterator to listFiles call in the provided patch. Obviously this is not a solution. Just demonstrating the source of the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org