You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steven Rand (JIRA)" <ji...@apache.org> on 2017/09/12 02:27:00 UTC

[jira] [Commented] (HADOOP-13712) S3A open to avoid needless HEAD on the successful execution path

    [ https://issues.apache.org/jira/browse/HADOOP-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162390#comment-16162390 ] 

Steven Rand commented on HADOOP-13712:
--------------------------------------

[~stevel@apache.org], I'm wondering whether it would be reasonable to add a new method to S3AFileSystem which is similar to {{open()}}, except that:

* The caller is responsible for providing the length of the file.
* The caller accepts that not all guarantees of {{FileSystem.open}} apply, i.e., we won't raise an FNFE if the file doesn't exist.
* We don't call {{getFileStatus}}, and instead just use the given length when constructing the S3AInputStream.

That way most callers can continue to call S3AFileSystem.open (and won't be affected), while callers who already know the length of the file and are okay with the weaker guarantees can use the new method and skip the getFileStatus call. The use case I have in mind is applications that already make a call to an external metastore/catalog type thing before trying to read a file, and get the info about its length and existence from there.

Do you think this would be a reasonable addition? If so I'm happy to submit a patch.

> S3A open to avoid needless HEAD on the successful execution path
> ----------------------------------------------------------------
>
>                 Key: HADOOP-13712
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13712
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>
> S3A's open() operation does a {{getFileStatus()}} check to see if a file is not a directory before opening with a GET. That initial check will take up at least one HEAD request if the file is present, more if it isn't.
> As the GET itself performs the existence check, it is needless. A successful GET of a path which doesn't end in "/" means a file was there. The only reason a getFileStatus call is needed is to choose which error message to display if the path isn't there: is it an FNFE or is it path-is-directory.
> Proposed: reorder the code to do the GET; only if that fails fallback to getFileStatus()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org