You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/10/12 15:48:32 UTC

[GitHub] [ozone] umamaheswararao commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

umamaheswararao commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727229155



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       Once you seeked, you can reset the seeked flag?

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.

Review comment:
       Please format below line.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockInputStream.java
##########
@@ -281,6 +282,69 @@ public void testReadCrossingMultipleECChunkBounds() throws IOException {
     }
   }
 
+  @Test(expected = EOFException.class)
+  public void testSeekPastBlockLength() throws IOException {
+    repConfig = new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+        ONEMB);
+    OmKeyLocationInfo keyInfo = createKeyInfo(repConfig, 5, 100);
+    try (ECBlockInputStream ecb = new ECBlockInputStream(repConfig,
+        keyInfo, true, null, null, streamFactory)) {
+      ecb.seek(1000);
+    }
+  }
+

Review comment:
       There is a special condition in code about file length and seek pos to be same as 0. It may be a good idea to just add one test to cover that case.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestKeyInputStream.java
##########
@@ -198,6 +220,37 @@ public void testSeekRandomly() throws Exception {
     keyInputStream.close();
   }
 
+  public void testECSeek() throws Exception {
+    int ecChunkSize = 1024 * 1024;
+    ECReplicationConfig repConfig = new ECReplicationConfig(3, 2, RS,
+        ecChunkSize);
+    String keyName = getNewKeyName();
+    // 3 full EC blocks plus one chunk
+    int dataLength = (9 * BLOCK_SIZE + ecChunkSize);
+
+    byte[] inputData = writeRandomBytes(keyName, repConfig, dataLength);
+    KeyInputStream keyInputStream = getKeyInputStream(keyName);

Review comment:
       Please use try for auto closing the stream.

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -309,8 +326,20 @@ public synchronized void unbuffer() {
   }
 
   @Override
-  public synchronized void seek(long l) throws IOException {
-    throw new NotImplementedException("Seek is not implements for EC");
+  public synchronized void seek(long pos) throws IOException {
+    checkOpen();
+    if (pos < 0 || pos >= getLength()) {
+      if (pos == 0) {
+        // It is possible for length and pos to be zero in which case
+        // seek should return instead of throwing exception
+        return;
+      }
+      throw new EOFException(
+          "EOF encountered at pos: " + pos + " for block: "
+              + blockInfo.getBlockID());
+    }
+    position = pos;

Review comment:
       if seeking to the same position, need not set this flag to true right?

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockInputStream.java
##########
@@ -281,6 +282,69 @@ public void testReadCrossingMultipleECChunkBounds() throws IOException {
     }
   }
 
+  @Test(expected = EOFException.class)
+  public void testSeekPastBlockLength() throws IOException {
+    repConfig = new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+        ONEMB);
+    OmKeyLocationInfo keyInfo = createKeyInfo(repConfig, 5, 100);
+    try (ECBlockInputStream ecb = new ECBlockInputStream(repConfig,
+        keyInfo, true, null, null, streamFactory)) {
+      ecb.seek(1000);
+    }
+  }
+

Review comment:
       Please ignore this comment. I saw that test below. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org