You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/04/27 18:29:00 UTC

[jira] [Commented] (HADOOP-16202) Enhance openFile() for better read performance against object stores

    [ https://issues.apache.org/jira/browse/HADOOP-16202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528961#comment-17528961 ] 

Steve Loughran commented on HADOOP-16202:
-----------------------------------------

this is in branch&3.3. 

going to reopen and pull out the distcp subtask, as this pr doesn't yet fix distcp to use the passed in file status when opening files or set the read policy. doing that would slightly speed up s3a file opening, and by fixing the read policy, ensure distcp was performant even if the cluster wide settings was for random io

> Enhance openFile() for better read performance against object stores 
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-16202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16202
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3, tools/distcp
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.4
>
>          Time Spent: 22.5h
>  Remaining Estimate: 0h
>
> The {{openFile()}} builder API lets us add new options when reading a file
> Add an option {{"fs.s3a.open.option.length"}} which takes a long and allows the length of the file to be declared. If set, *no check for the existence of the file is issued when opening the file*
> Also: withFileStatus() to take any FileStatus implementation, rather than only S3AFileStatus -and not check that the path matches the path being opened. Needed to support viewFS-style wrapping and mounting.
> and Adopt where appropriate to stop clusters with S3A reads switched to random IO from killing download/localization
> * fs shell copyToLocal
> * distcp
> * IOUtils.copy



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org