You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/04/14 15:53:00 UTC

[jira] [Updated] (HADOOP-16465) Tune S3AFileSystem.listLocatedStatus

     [ https://issues.apache.org/jira/browse/HADOOP-16465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-16465:
------------------------------------
    Summary: Tune S3AFileSystem.listLocatedStatus  (was: S3AFileSystem.listLocatedStatus improvements/fixes)

> Tune S3AFileSystem.listLocatedStatus
> ------------------------------------
>
>                 Key: HADOOP-16465
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16465
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Mukund Thakur
>            Priority: Major
>
> Looking at logs of LocatedFileStatus/FileInputFormat scans; there's a needless call to getFileStatus whenever a S3AFileSystem.listLocatedStatus() call is made
> # {{S3AFileSystem.listLocatedStatus()}} does a getFileStatus call, returns the file status first
> # But if you look at all the uses in the MR code in FileInputFormat and LocatedFileStatusFetcher, they only call this method *knowing the destination is a directory*
> Which means for every unguarded S3 path: two needless HEADS and a single entry LIST, before the real LIST is initiated.
> If the S3A FS can assume that a dest is a non-empty directory, then it can go straight to the LIST operation, only falling back to the HEAD + HEAD +/ if that fails.
> We could also think about doing the same for listStatus



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org