You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2020/04/29 05:58:00 UTC

[jira] [Resolved] (HDDS-3223) Improve s3g read 1GB object efficiency by 100 times

     [ https://issues.apache.org/jira/browse/HDDS-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bharat Viswanadham resolved HDDS-3223.
--------------------------------------
    Fix Version/s: 0.6.0
       Resolution: Fixed

> Improve s3g read 1GB object efficiency by 100 times 
> ----------------------------------------------------
>
>                 Key: HDDS-3223
>                 URL: https://issues.apache.org/jira/browse/HDDS-3223
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>         Attachments: screenshot-1.png
>
>
> *What's the problem ?*
> Read 1000M object, it cost about 470 seconds, i.e. 2.2M/s, which is too slow. 
> *What's the reason ?*
> When read 1000M file, there are 50 GET requests, each GET request read 20M. When do GET, the stack is: [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262] -> [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190] -> [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064] -> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957].
> It means, the 50th GET request which should read 980M-1000M, but to skip 0-980M, it also [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957] 0-980M. So the 1st GET request read 0-20M, the 2nd GET request read 0-40M, the 3rd GET request read 0-60M, ..., the 50th GET request read 0-1000M. So the GET  request from 1st-50th become slower and slower.
> You can also refer it [here|https://issues.apache.org/jira/browse/IO-203] why IOUtils implement skip by read rather than real skip, e.g. seek.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org