You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by st...@apache.org on 2022/06/22 17:21:52 UTC
[hadoop] 06/06: HADOOP-18103. Add a high-performance vectored read API. (#4476)
This is an automated email from the ASF dual-hosted git repository.
stevel pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/hadoop.git
commit e1842b2a749d79cbdc15c524515b9eda64c339d5
Merge: e6ecc4f3e44 4d1f6f9b995
Author: Steve Loughran <st...@cloudera.com>
AuthorDate: Wed Jun 22 17:33:40 2022 +0100
HADOOP-18103. Add a high-performance vectored read API. (#4476)
This feature adds methods for ranged vectored read operations
in PositionedReadable.
All stream which implement that interface support the new API.
The default implementation reads each range in the vector
sequentially.
However, specific implementations may provide higher performance
versions. This is done in two places
* Local FileSystem/Checksum FileSystem
* The S3A client.
The S3A client first coalesces adjacent and "nearby" ranges
together, then fetches each range in separate HTTP GET requests,
executed in parallel. As such it delivers significant speedups
to applications reading separate blocks of data from the same
file, columnar data format libraries in particular.
This is the merge commit of the feature branch; the work is in
HADOOP-11867. Add a high-performance vectored read API.
HADOOP-18104. S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads.
HADOOP-18107. Adding scale test for vectored reads for large file
HADOOP-18105. Implement buffer pooling with weak references.
HADOOP-18106. Handle memory fragmentation in S3A Vectored IO.
Contributed By: Owen O'Malley and Mukund Thakur
dev-support/Jenkinsfile | 2 +-
.../apache/hadoop/fs/BufferedFSInputStream.java | 27 +-
.../org/apache/hadoop/fs/ChecksumFileSystem.java | 213 +++++++++--
.../org/apache/hadoop/fs/FSDataInputStream.java | 22 +-
.../main/java/org/apache/hadoop/fs/FileRange.java | 67 ++++
.../org/apache/hadoop/fs/PositionedReadable.java | 41 ++-
.../org/apache/hadoop/fs/RawLocalFileSystem.java | 110 +++++-
.../org/apache/hadoop/fs/StreamCapabilities.java | 6 +
.../org/apache/hadoop/fs/VectoredReadUtils.java | 292 +++++++++++++++
.../apache/hadoop/fs/impl/CombinedFileRange.java | 70 ++++
.../org/apache/hadoop/fs/impl/FileRangeImpl.java | 74 ++++
.../java/org/apache/hadoop/io/ByteBufferPool.java | 5 +
.../apache/hadoop/io/ElasticByteBufferPool.java | 4 +-
.../io/WeakReferencedElasticByteBufferPool.java | 155 ++++++++
.../site/markdown/filesystem/fsdatainputstream.md | 39 ++
.../apache/hadoop/fs/TestVectoredReadUtils.java | 371 +++++++++++++++++++
.../contract/AbstractContractVectoredReadTest.java | 406 +++++++++++++++++++++
.../hadoop/fs/contract/ContractTestUtils.java | 84 +++++
.../localfs/TestLocalFSContractVectoredRead.java | 86 +++++
.../rawlocal/TestRawLocalContractVectoredRead.java | 35 ++
...estMoreWeakReferencedElasticByteBufferPool.java | 97 +++++
.../TestWeakReferencedElasticByteBufferPool.java | 232 ++++++++++++
.../java/org/apache/hadoop/test/MoreAsserts.java | 49 ++-
hadoop-common-project/pom.xml | 1 -
hadoop-project/pom.xml | 11 +
.../java/org/apache/hadoop/fs/s3a/Constants.java | 26 ++
.../org/apache/hadoop/fs/s3a/S3AFileSystem.java | 39 +-
.../org/apache/hadoop/fs/s3a/S3AInputStream.java | 391 +++++++++++++++++++-
.../org/apache/hadoop/fs/s3a/S3AReadOpContext.java | 20 +-
.../apache/hadoop/fs/s3a/VectoredIOContext.java | 78 ++++
.../fs/s3a/impl/GetContentSummaryOperation.java | 3 +-
.../site/markdown/tools/hadoop-aws/performance.md | 30 ++
.../contract/s3a/ITestS3AContractVectoredRead.java | 159 ++++++++
.../hadoop/fs/s3a/TestS3AInputStreamRetry.java | 3 +-
.../fs/s3a/scale/AbstractSTestS3AHugeFiles.java | 32 ++
.../hadoop-aws/src/test/resources/log4j.properties | 2 +-
hadoop-tools/hadoop-benchmark/pom.xml | 94 +++++
.../hadoop-benchmark/src/main/assembly/uber.xml | 33 ++
.../hadoop-benchmark/src/main/findbugs/exclude.xml | 22 ++
.../hadoop/benchmark/VectoredReadBenchmark.java | 245 +++++++++++++
.../org/apache/hadoop/benchmark/package-info.java | 22 ++
hadoop-tools/pom.xml | 1 +
pom.xml | 1 +
43 files changed, 3603 insertions(+), 97 deletions(-)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org