You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/03/28 11:36:00 UTC

[jira] [Updated] (HADOOP-18179) Boost S3A Stream Read Performance

     [ https://issues.apache.org/jira/browse/HADOOP-18179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-18179:
------------------------------------
    Description: 
calibrate S3A input stream performance against recent applications/data formats and improve where necessary.

HADOOP-18028 is a key part of this, but there are other issues/opertunities

# we could add machine parsable trace-level logging in FSDataInputStream to collect stats on how stream apis are invoked, so collect data from real apps; analyze
# implement those APIs which some apps use (ByteBufferPositionedReadable), not so much for direct implementation as to get better information from the app as its read plan
# the `normal` mode doesn't switch from sequential on forward seeks. Is that always appropriate?
# choose different buffering options when doing whole file IO vs sequential vs random

> Boost S3A Stream Read Performance
> ---------------------------------
>
>                 Key: HADOOP-18179
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18179
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.2
>            Reporter: Steve Loughran
>            Priority: Major
>
> calibrate S3A input stream performance against recent applications/data formats and improve where necessary.
> HADOOP-18028 is a key part of this, but there are other issues/opertunities
> # we could add machine parsable trace-level logging in FSDataInputStream to collect stats on how stream apis are invoked, so collect data from real apps; analyze
> # implement those APIs which some apps use (ByteBufferPositionedReadable), not so much for direct implementation as to get better information from the app as its read plan
> # the `normal` mode doesn't switch from sequential on forward seeks. Is that always appropriate?
> # choose different buffering options when doing whole file IO vs sequential vs random



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org