You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Bhalchandra Pandit (Jira)" <ji...@apache.org> on 2021/11/29 16:13:00 UTC

[jira] [Created] (HADOOP-18028) improve S3 read speed using prefetching & caching

Bhalchandra Pandit created HADOOP-18028:
-------------------------------------------

             Summary: improve S3 read speed using prefetching & caching
                 Key: HADOOP-18028
                 URL: https://issues.apache.org/jira/browse/HADOOP-18028
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs/s3
            Reporter: Bhalchandra Pandit


I work for Pinterest. I developed a technique for vastly improving read throughput when reading from the S3 file system. It not only helps the sequential read case (like reading a SequenceFile) but also significantly improves read throughput of a random access case (like reading Parquet). This technique has been very useful in significantly improving efficiency of the data processing jobs at Pinterest. 
 
I would like to contribute that feature to Apache Hadoop. More details on this technique are available in this blog I wrote recently:
[https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org