You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/26 13:19:00 UTC

[jira] [Commented] (HADOOP-15245) S3AInputStream.skip() to use lazy seek

    [ https://issues.apache.org/jira/browse/HADOOP-15245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624461#comment-17624461 ] 

ASF GitHub Bot commented on HADOOP-15245:
-----------------------------------------

steveloughran commented on PR #3927:
URL: https://github.com/apache/hadoop/pull/3927#issuecomment-1292022816

   can you rebase and we can look at this again




> S3AInputStream.skip() to use lazy seek
> --------------------------------------
>
>                 Key: HADOOP-15245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15245
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> the default skip() does a read and discard of all bytes, no matter how far ahead the skip is. This is very inefficient if the skip() is being done on S3A random IO, though exactly what to do when in sequential mode.
> Proposed: 
> * add an optimized version of S3AInputStream.skip() which does a lazy seek, which itself will decided when to skip() vs issue a new GET.
> * add some more instrumentation to measure how often this gets used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org