You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/12/12 14:05:00 UTC

[jira] [Commented] (HADOOP-15875) S3AInputStream.seek should throw EOFException if seeking past the end of file

    [ https://issues.apache.org/jira/browse/HADOOP-15875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248357#comment-17248357 ] 

Steve Loughran commented on HADOOP-15875:
-----------------------------------------

Note that with the openFile() changes of HADOOP-16202 I'd been hoping that passing split start/end in to a file open would be enough to fix the content length. But it isn't, as reader code assumes its ok read past the end of a split if the end of the split < EOF.

in HADOOP-17415 I'm wondering about whether we should be looking at the Content-Range header of any response and use that to dynamically determine the full length of a file. Do a full GET and the length == range, fix there. Do a partial read and we will be able to update the length and so know what the final EOF is

Together this should allow workers given a filename and split range to be able to open and read data past the split end if need be, without the need for any HEAD at open time. All store IO would be postponed until that first GET

> S3AInputStream.seek should throw EOFException if seeking past the end of file
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-15875
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15875
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Shixiong Zhu
>            Priority: Minor
>
> I read the javadoc of `Seekable.seek` but it doesn't say what should be done when seeking past the end of file. Right now, DFSInputStream throws new EOFException, but S3AInputStream doesn't throw any error.
> I think it's better to have consistent behavior in `seek.`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org