You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2022/11/14 12:35:41 UTC

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #1010: PARQUET-2213: add InputFile.newStream with a read range

steveloughran commented on code in PR #1010:
URL: https://github.com/apache/parquet-mr/pull/1010#discussion_r1021472465


##########
parquet-common/src/main/java/org/apache/parquet/io/InputFile.java:
##########
@@ -41,4 +41,16 @@ public interface InputFile {
    */
   SeekableInputStream newStream() throws IOException;
 
+  /**
+   * Open a new {@link SeekableInputStream} for the underlying data file,
+   * in the range of '[offset, offset + length)'
+   *
+   * @param offset the offset in the file to read from
+   * @param length the total number of bytes to read
+   * @return a new {@link SeekableInputStream} to read the file
+   * @throws IOException if the stream cannot be opened
+   */
+  default SeekableInputStream newStream(long offset, long length) throws IOException {

Review Comment:
   this is what the vectored IO API of hadoop 3.3.5, an API for which we intend to provide in a compatibility library for apps running against hadoop 3.2.x+. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md#default-void-readvectoredlist-extends-filerange-ranges-intfunctionbytebuffer-allocate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org