You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/02/23 19:33:54 UTC

[GitHub] [hadoop] steveloughran commented on a change in pull request #3964: HADOOP-18104: S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads

steveloughran commented on a change in pull request #3964:
URL: https://github.com/apache/hadoop/pull/3964#discussion_r813241265



##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/impl/TestVectoredReadUtils.java
##########
@@ -207,6 +207,32 @@ public void testSortAndMergeMoreCases() throws Exception {
             VectoredReadUtils.isOrderedDisjoint(outputList, 1, 800));
 
   }
+
+  @Test
+  public void testMaxSizeZeroDisablesMering() throws Exception {
+    List<FileRange> randomRanges = Arrays.asList(
+            new FileRangeImpl(3000, 110),
+            new FileRangeImpl(3000, 100),
+            new FileRangeImpl(2100, 100)
+    );
+
+    List<CombinedFileRange> combinedFileRanges = VectoredReadUtils
+            .sortAndMergeRanges(randomRanges, 1, 1, 0);
+    assertEquals("Mismatch in number of ranges post merging",

Review comment:
       if you use the assertj assertion on list size, you get a more detailed message if there's a mismatch

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
##########
@@ -1066,4 +1066,29 @@ private Constants() {
    * Require that all S3 access is made through Access Points.
    */
   public static final String AWS_S3_ACCESSPOINT_REQUIRED = "fs.s3a.accesspoint.required";
+
+  /**
+   * What is the smallest reasonable seek that we should group ranges
+   * together during vectored read operation.
+   * Value : {@value}.
+   */
+  public static final String AWS_S3_MIN_SEEK_VECTOR_READS = "fs.s3a.min.seek.vectored.read";

Review comment:
       should we always have vectored earlier, e.g `fs.s3a.vectored.read.min.seek``fs.s3a.vectored.read.max.read.size`. etc

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
##########
@@ -1066,4 +1066,29 @@ private Constants() {
    * Require that all S3 access is made through Access Points.
    */
   public static final String AWS_S3_ACCESSPOINT_REQUIRED = "fs.s3a.accesspoint.required";
+
+  /**
+   * What is the smallest reasonable seek that we should group ranges
+   * together during vectored read operation.
+   * Value : {@value}.
+   */
+  public static final String AWS_S3_MIN_SEEK_VECTOR_READS = "fs.s3a.min.seek.vectored.read";
+
+  /**
+   * What is the largest size that we should group ranges
+   * together during vectored read?

Review comment:
       not sure that "readsize" is the best term here

##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MoreAsserts.java
##########
@@ -86,4 +86,16 @@
                     "completed exceptionally")
             .isTrue();
   }
+
+  /**
+   * Assert two same type of values.
+   * @param actual actual value.
+   * @param expected expected value.
+   * @param message error message to print in case of mismatch.
+   */
+  public static <T> void assertEqual(T actual, T expected, String message) {
+    Assertions.assertThat(actual)
+            .describedAs("Mismatch in "+ message)

Review comment:
       nit: add a space between " and +

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AReadOpContext.java
##########
@@ -69,17 +75,19 @@
    * @param changeDetectionPolicy change detection policy.
    * @param readahead readahead for GET operations/skip, etc.
    * @param auditSpan active audit
+   * @param vectoredIOContext

Review comment:
       any description?

##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
##########
@@ -51,4 +58,44 @@ public void testEOFRanges() throws Exception {
     fileRanges.add(new FileRangeImpl(DATASET_LEN, 100));
     testExceptionalVectoredRead(fs, fileRanges, "EOFException is expected");
   }
+
+  @Test
+  public void testMinSeekAndMaxSizeConfigsPropagation() throws Exception {
+    Configuration conf = getFileSystem().getConf();
+    S3ATestUtils.disableFilesystemCaching(conf);
+    final int configuredMinSeek = 1024;
+    final int configuredMaxSize = 2 * 1024;
+    conf.setInt(Constants.AWS_S3_MIN_SEEK_VECTOR_READS, configuredMinSeek);
+    conf.setInt(Constants.AWS_S3_MAX_READSIZE_VECTOR_READS, configuredMaxSize);
+    try (S3AFileSystem fs = S3ATestUtils.createTestFileSystem(conf)) {
+      try (FSDataInputStream fis = fs.open(path(VECTORED_READ_FILE_NAME))) {
+        System.out.println(fis.toString());

Review comment:
       use logging

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AReadOpContext.java
##########
@@ -69,17 +75,19 @@
    * @param changeDetectionPolicy change detection policy.
    * @param readahead readahead for GET operations/skip, etc.
    * @param auditSpan active audit
+   * @param vectoredIOContext
    */
   public S3AReadOpContext(
-      final Path path,
-      Invoker invoker,
-      @Nullable FileSystem.Statistics stats,
-      S3AStatisticsContext instrumentation,
-      FileStatus dstFileStatus,
-      S3AInputPolicy inputPolicy,
-      ChangeDetectionPolicy changeDetectionPolicy,
-      final long readahead,
-      final AuditSpan auditSpan) {
+          final Path path,

Review comment:
       not sure an indentation change was needed, but if it is, well, ok

##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
##########
@@ -19,15 +19,23 @@
 package org.apache.hadoop.fs.contract.s3a;
 
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
 import org.apache.hadoop.fs.FileRange;
 import org.apache.hadoop.fs.FileRangeImpl;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.contract.AbstractContractVectoredReadTest;
 import org.apache.hadoop.fs.contract.AbstractFSContract;
+import org.apache.hadoop.fs.s3a.Constants;
+import org.apache.hadoop.fs.s3a.S3AFileSystem;
+import org.apache.hadoop.fs.s3a.S3ATestUtils;
 
 import java.util.ArrayList;
 import java.util.List;
 
+import org.junit.Test;

Review comment:
       i think the ordering of imports is already mixed up here; should be
   java
   other
   org.apache

##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
##########
@@ -51,4 +58,44 @@ public void testEOFRanges() throws Exception {
     fileRanges.add(new FileRangeImpl(DATASET_LEN, 100));
     testExceptionalVectoredRead(fs, fileRanges, "EOFException is expected");
   }
+
+  @Test
+  public void testMinSeekAndMaxSizeConfigsPropagation() throws Exception {
+    Configuration conf = getFileSystem().getConf();
+    S3ATestUtils.disableFilesystemCaching(conf);
+    final int configuredMinSeek = 1024;
+    final int configuredMaxSize = 2 * 1024;
+    conf.setInt(Constants.AWS_S3_MIN_SEEK_VECTOR_READS, configuredMinSeek);

Review comment:
       use the `S3ATestUtils.removeBaseAndBucketOverrides()` cal to remove the constants before you set them, in case the test bucket sets them explicitly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org