You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/01/19 08:00:15 UTC

[GitHub] [hadoop] mukund-thakur opened a new pull request #3904: HADOOP-11867. Add a high performance vectored read API to file system.

mukund-thakur opened a new pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904


   
   
   ### Description of PR
   Adding support for multiple ranged read async api in PositionedReadable. The default iterates through the ranges to read each synchronously, but the intent is that FSDataInputStream subclasses can make more efficient readers especially object stores implementation.
   
   ### How was this patch tested?
   Added benchmarks.
   Added UT's
   Added new contract tests for new API spec.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] mukund-thakur commented on pull request #3904: HADOOP-11867. Add a high performance vectored read API to file system.

Posted by GitBox <gi...@apache.org>.

mukund-thakur commented on pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#issuecomment-1016177915


   This is based on https://github.com/apache/hadoop/pull/3499. The plan is to merge this as a base commit in the feature branch and then split work on the final changes ( other optimisations and features) up into new PRs, each with their own JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] hadoop-yetus commented on pull request #3904: HADOOP-11867. Add a high performance vectored read API to file system.

Posted by GitBox <gi...@apache.org>.

hadoop-yetus commented on pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#issuecomment-1023854948


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  shelldocs  |   0m  1s |  |  Shelldocs was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 8 new or modified test files.  |
   |||| _ feature-vectored-io Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m  7s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m 37s |  |  feature-vectored-io passed  |
   | +1 :green_heart: |  compile  |  26m 10s |  |  feature-vectored-io passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |  21m 16s |  |  feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   3m 41s |  |  feature-vectored-io passed  |
   | +1 :green_heart: |  mvnsite  |  27m 27s |  |  feature-vectored-io passed  |
   | +1 :green_heart: |  javadoc  |   8m 34s |  |  feature-vectored-io passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   8m 32s |  |  feature-vectored-io passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +0 :ok: |  spotbugs  |   0m 21s |  |  branch/hadoop-project no spotbugs output file (spotbugsXml.xml)  |
   | +1 :green_heart: |  shadedclient  |  55m 25s |  |  branch has no errors when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  55m 47s |  |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 48s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  29m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 59s |  |  the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |  23m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 11s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  21m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | -0 :warning: |  checkstyle  |   3m 37s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3904/5/artifact/out/results-checkstyle-root.txt) |  root: The patch generated 1 new + 78 unchanged - 3 fixed = 79 total (was 81)  |
   | +1 :green_heart: |  mvnsite  |  22m 54s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m  0s |  |  No new issues.  |
   | +1 :green_heart: |  xml  |   0m 10s |  |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  javadoc  |   8m 36s |  |  the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   8m 32s |  |  the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +0 :ok: |  spotbugs  |   0m 21s |  |  hadoop-project has no data from spotbugs  |
   | +1 :green_heart: |  shadedclient  |  56m  3s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  | 794m 56s | [/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3904/5/artifact/out/patch-unit-root.txt) |  root in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 38s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 1196m 55s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.yarn.csi.client.TestCsiClient |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3904/5/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3904 |
   | Optional Tests | dupname asflicense codespell shellcheck shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint xml |
   | uname | Linux f5cb218f3898 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | feature-vectored-io / 326557de1404f38ca2b02ffacc1b0814c9281d80 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3904/5/testReport/ |
   | Max. process+thread count | 3075 (vs. ulimit of 5500) |
   | modules | C: hadoop-project hadoop-common-project/hadoop-common hadoop-common-project hadoop-tools/hadoop-aws hadoop-tools/hadoop-benchmark hadoop-tools . U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3904/5/console |
   | versions | git=2.25.1 maven=3.6.3 shellcheck=0.7.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] mukund-thakur commented on pull request #3904: HADOOP-11867. Add a high performance vectored read API to file system.

Posted by GitBox <gi...@apache.org>.

mukund-thakur commented on pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#issuecomment-1023925561


   Yetus failing because of one known test failure. https://issues.apache.org/jira/browse/YARN-10788 
   
   `./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:551:    ChecksumFSOutputSummer(ChecksumFileSystem fs,:5: More than 7 parameters (found 8). [ParameterNumber]`
   
   Also this only checkstyle error is not introduced by my patch It was already there. 
   
   I think it is time to merge this now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] mukund-thakur merged pull request #3904: HADOOP-11867. Add a high-performance vectored read API.

Posted by GitBox <gi...@apache.org>.

mukund-thakur merged pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a change in pull request #3904: HADOOP-11867. Add a high performance vectored read API to file system.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on a change in pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#discussion_r789545997



##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java
##########
@@ -120,6 +119,71 @@ public void testVectoredReadAndReadFully()  throws Exception {
     }
   }
 
+  /**
+   * As the minimum seek value is 4*1024,none of the below ranges
+   * will get merged.
+   */
+  @Test
+  public void testDisjointRanges() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 + 101, 100));

Review comment:
       nit: spaces

##########
File path: hadoop-tools/hadoop-benchmark/src/main/java/org/apache/hadoop/benchmark/package-info.java
##########
@@ -0,0 +1,22 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Benchmark for Vectored Read IO operations.
+ */
+package org.apache.hadoop.benchmark;

Review comment:
       nit: newline

##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java
##########
@@ -50,13 +50,12 @@
 
   private static final Logger LOG = LoggerFactory.getLogger(AbstractContractVectoredReadTest.class);
 
-  public static final int DATASET_LEN = 1024;
+  public static final int DATASET_LEN = 64*1024;

Review comment:
       nit: add some spaces

##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java
##########
@@ -120,6 +119,71 @@ public void testVectoredReadAndReadFully()  throws Exception {
     }
   }
 
+  /**
+   * As the minimum seek value is 4*1024,none of the below ranges
+   * will get merged.
+   */
+  @Test
+  public void testDisjointRanges() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 + 101, 100));
+    fileRanges.add(new FileRangeImpl(16*1024 + 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, all the below ranges
+   * will get merged into one.
+   */
+  @Test
+  public void testAllRangesMergedIntoOne() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(8*1024 - 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, the first three ranges will be
+   * merged into and other two will remain as it is.
+   */
+  @Test
+  public void testSomeRangesMergedSomeUnmerged() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(8*1024, 100));
+    fileRanges.add(new FileRangeImpl(14*1024, 100));
+    fileRanges.add(new FileRangeImpl(10*1024, 100));
+    fileRanges.add(new FileRangeImpl(2 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(40*1024, 1024));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  @Test(timeout = 1800000)

Review comment:
       override   getTestTimeoutMillis() for the whole suite. this lets subclasses define really big timeouts if needed

##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java
##########
@@ -120,6 +119,71 @@ public void testVectoredReadAndReadFully()  throws Exception {
     }
   }
 
+  /**
+   * As the minimum seek value is 4*1024,none of the below ranges
+   * will get merged.
+   */
+  @Test
+  public void testDisjointRanges() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 + 101, 100));
+    fileRanges.add(new FileRangeImpl(16*1024 + 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, all the below ranges
+   * will get merged into one.
+   */
+  @Test
+  public void testAllRangesMergedIntoOne() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(8*1024 - 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, the first three ranges will be
+   * merged into and other two will remain as it is.
+   */
+  @Test
+  public void testSomeRangesMergedSomeUnmerged() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(8*1024, 100));
+    fileRanges.add(new FileRangeImpl(14*1024, 100));
+    fileRanges.add(new FileRangeImpl(10*1024, 100));
+    fileRanges.add(new FileRangeImpl(2 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(40*1024, 1024));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  @Test(timeout = 1800000)
+  public void testSameRanges() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(8*1024, 1000));
+    fileRanges.add(new FileRangeImpl(8*1024, 1000));
+    fileRanges.add(new FileRangeImpl(8*1024, 1000));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {

Review comment:
       are you using openFile in tests too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] mukund-thakur commented on a change in pull request #3904: HADOOP-11867. Add a high performance vectored read API to file system.

Posted by GitBox <gi...@apache.org>.

mukund-thakur commented on a change in pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#discussion_r790581863



##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java
##########
@@ -120,6 +119,71 @@ public void testVectoredReadAndReadFully()  throws Exception {
     }
   }
 
+  /**
+   * As the minimum seek value is 4*1024,none of the below ranges
+   * will get merged.
+   */
+  @Test
+  public void testDisjointRanges() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 + 101, 100));
+    fileRanges.add(new FileRangeImpl(16*1024 + 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, all the below ranges
+   * will get merged into one.
+   */
+  @Test
+  public void testAllRangesMergedIntoOne() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(8*1024 - 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, the first three ranges will be
+   * merged into and other two will remain as it is.
+   */
+  @Test
+  public void testSomeRangesMergedSomeUnmerged() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(8*1024, 100));
+    fileRanges.add(new FileRangeImpl(14*1024, 100));
+    fileRanges.add(new FileRangeImpl(10*1024, 100));
+    fileRanges.add(new FileRangeImpl(2 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(40*1024, 1024));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  @Test(timeout = 1800000)
+  public void testSameRanges() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(8*1024, 1000));
+    fileRanges.add(new FileRangeImpl(8*1024, 1000));
+    fileRanges.add(new FileRangeImpl(8*1024, 1000));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {

Review comment:
       yeah there was already one, adding in one more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran edited a comment on pull request #3904: HADOOP-11867. Add a high-performance vectored read API.

Posted by GitBox <gi...@apache.org>.

steveloughran edited a comment on pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#issuecomment-1026711104


   +1 to merge into this branch as feature complete, now we follow up with the final integration work.
   
   * remove the extended timeout and lets see if things are good. if not, the change can go in as a single patch into trunk
   * this pr can/should still be rebased as relevant changes go in, it just needs to be co-ordinated with anyone else with the branch checked out
   * but i do want the patches to go in as independent commits for better tracing, with "part of HADOOP-11867" in each patch so a git log --grep will find them all
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] mukund-thakur commented on a change in pull request #3904: HADOOP-11867. Add a high performance vectored read API to file system.

Posted by GitBox <gi...@apache.org>.

mukund-thakur commented on a change in pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#discussion_r790579683



##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java
##########
@@ -120,6 +119,71 @@ public void testVectoredReadAndReadFully()  throws Exception {
     }
   }
 
+  /**
+   * As the minimum seek value is 4*1024,none of the below ranges
+   * will get merged.
+   */
+  @Test
+  public void testDisjointRanges() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 + 101, 100));
+    fileRanges.add(new FileRangeImpl(16*1024 + 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, all the below ranges
+   * will get merged into one.
+   */
+  @Test
+  public void testAllRangesMergedIntoOne() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(0, 100));
+    fileRanges.add(new FileRangeImpl(4 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(8*1024 - 101, 100));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  /**
+   * As the minimum seek value is 4*1024, the first three ranges will be
+   * merged into and other two will remain as it is.
+   */
+  @Test
+  public void testSomeRangesMergedSomeUnmerged() throws Exception {
+    FileSystem fs = getFileSystem();
+    List<FileRange> fileRanges = new ArrayList<>();
+    fileRanges.add(new FileRangeImpl(8*1024, 100));
+    fileRanges.add(new FileRangeImpl(14*1024, 100));
+    fileRanges.add(new FileRangeImpl(10*1024, 100));
+    fileRanges.add(new FileRangeImpl(2 *1024 - 101, 100));
+    fileRanges.add(new FileRangeImpl(40*1024, 1024));
+    try (FSDataInputStream in = fs.open(path(VECTORED_READ_FILE_NAME))) {
+      in.readVectored(fileRanges, allocate);
+      validateVectoredReadResult(fileRanges);
+    }
+  }
+
+  @Test(timeout = 1800000)

Review comment:
       ah.. that was just for testing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on pull request #3904: HADOOP-11867. Add a high-performance vectored read API.

Posted by GitBox <gi...@apache.org>.

steveloughran commented on pull request #3904:
URL: https://github.com/apache/hadoop/pull/3904#issuecomment-1026711104


   +1 to merge into this branch as feature complete, now we follow up with the final integration work.
   
   * remove the extended timeout and lets see if things are good. if not, the change can go in as a single patch into trunk
   * this pr can/should still be rebased as relevant changes go in, it just needs to be co-ordinated with anyone else with the branch checked out
   * but i do want the patches to go in as independent commits for better tracing, with "part of HADOOP-11687" in each patch so a git log --grep will find them all
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org