You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/10/08 10:20:11 UTC

[GitHub] [ozone] sodonnel opened a new pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

sodonnel opened a new pull request #2723:
URL: https://github.com/apache/ozone/pull/2723


   ## What changes were proposed in this pull request?
   
   Implement seek on the ECBlockInputStream
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5552
   
   ## How was this patch tested?
   
   New unit and integration tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r728236731



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       Got it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727328851



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       No, because as it stands we don't track if we have seeked all streams or not. Tracking this would add some additional complexity so I figured it was easier to just do the check each time after a block has been seeked.
   
   The usual case would not have seeks, and if we have seeked the block at all then we can pay the small overhead of checking the position on each read.

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.

Review comment:
       I'm not sure what the issue is with the below line?

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -309,8 +326,20 @@ public synchronized void unbuffer() {
   }
 
   @Override
-  public synchronized void seek(long l) throws IOException {
-    throw new NotImplementedException("Seek is not implements for EC");
+  public synchronized void seek(long pos) throws IOException {
+    checkOpen();
+    if (pos < 0 || pos >= getLength()) {
+      if (pos == 0) {
+        // It is possible for length and pos to be zero in which case
+        // seek should return instead of throwing exception
+        return;
+      }
+      throw new EOFException(
+          "EOF encountered at pos: " + pos + " for block: "
+              + blockInfo.getBlockID());
+    }
+    position = pos;

Review comment:
       Yea I guess we can add an optimization to skip seeking if we are seeking to the current position.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727328851



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       No, because as it stands we don't track if we have seeked all streams or not. Tracking this would add some additional complexity so I figured it was easier to just do the check each time after a block has been seeked.
   
   The usual case would not have seeks, and if we have seeked the block at all then we can pay the small overhead of checking the position on each read.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2723:
URL: https://github.com/apache/ozone/pull/2723


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#issuecomment-942470933


   Latest changes looks good to me. +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2723:
URL: https://github.com/apache/ozone/pull/2723


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727336192



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -309,8 +326,20 @@ public synchronized void unbuffer() {
   }
 
   @Override
-  public synchronized void seek(long l) throws IOException {
-    throw new NotImplementedException("Seek is not implements for EC");
+  public synchronized void seek(long pos) throws IOException {
+    checkOpen();
+    if (pos < 0 || pos >= getLength()) {
+      if (pos == 0) {
+        // It is possible for length and pos to be zero in which case
+        // seek should return instead of throwing exception
+        return;
+      }
+      throw new EOFException(
+          "EOF encountered at pos: " + pos + " for block: "
+              + blockInfo.getBlockID());
+    }
+    position = pos;

Review comment:
       Yea I guess we can add an optimization to skip seeking if we are seeking to the current position.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727329341



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.

Review comment:
       I'm not sure what the issue is with the below line?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727229155



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       Once you seeked, you can reset the seeked flag?

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.

Review comment:
       Please format below line.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockInputStream.java
##########
@@ -281,6 +282,69 @@ public void testReadCrossingMultipleECChunkBounds() throws IOException {
     }
   }
 
+  @Test(expected = EOFException.class)
+  public void testSeekPastBlockLength() throws IOException {
+    repConfig = new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+        ONEMB);
+    OmKeyLocationInfo keyInfo = createKeyInfo(repConfig, 5, 100);
+    try (ECBlockInputStream ecb = new ECBlockInputStream(repConfig,
+        keyInfo, true, null, null, streamFactory)) {
+      ecb.seek(1000);
+    }
+  }
+

Review comment:
       There is a special condition in code about file length and seek pos to be same as 0. It may be a good idea to just add one test to cover that case.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestKeyInputStream.java
##########
@@ -198,6 +220,37 @@ public void testSeekRandomly() throws Exception {
     keyInputStream.close();
   }
 
+  public void testECSeek() throws Exception {
+    int ecChunkSize = 1024 * 1024;
+    ECReplicationConfig repConfig = new ECReplicationConfig(3, 2, RS,
+        ecChunkSize);
+    String keyName = getNewKeyName();
+    // 3 full EC blocks plus one chunk
+    int dataLength = (9 * BLOCK_SIZE + ecChunkSize);
+
+    byte[] inputData = writeRandomBytes(keyName, repConfig, dataLength);
+    KeyInputStream keyInputStream = getKeyInputStream(keyName);

Review comment:
       Please use try for auto closing the stream.

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -309,8 +326,20 @@ public synchronized void unbuffer() {
   }
 
   @Override
-  public synchronized void seek(long l) throws IOException {
-    throw new NotImplementedException("Seek is not implements for EC");
+  public synchronized void seek(long pos) throws IOException {
+    checkOpen();
+    if (pos < 0 || pos >= getLength()) {
+      if (pos == 0) {
+        // It is possible for length and pos to be zero in which case
+        // seek should return instead of throwing exception
+        return;
+      }
+      throw new EOFException(
+          "EOF encountered at pos: " + pos + " for block: "
+              + blockInfo.getBlockID());
+    }
+    position = pos;

Review comment:
       if seeking to the same position, need not set this flag to true right?

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockInputStream.java
##########
@@ -281,6 +282,69 @@ public void testReadCrossingMultipleECChunkBounds() throws IOException {
     }
   }
 
+  @Test(expected = EOFException.class)
+  public void testSeekPastBlockLength() throws IOException {
+    repConfig = new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+        ONEMB);
+    OmKeyLocationInfo keyInfo = createKeyInfo(repConfig, 5, 100);
+    try (ECBlockInputStream ecb = new ECBlockInputStream(repConfig,
+        keyInfo, true, null, null, streamFactory)) {
+      ecb.seek(1000);
+    }
+  }
+

Review comment:
       Please ignore this comment. I saw that test below. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727229155



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       Once you seeked, you can reset the seeked flag?

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.

Review comment:
       Please format below line.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockInputStream.java
##########
@@ -281,6 +282,69 @@ public void testReadCrossingMultipleECChunkBounds() throws IOException {
     }
   }
 
+  @Test(expected = EOFException.class)
+  public void testSeekPastBlockLength() throws IOException {
+    repConfig = new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+        ONEMB);
+    OmKeyLocationInfo keyInfo = createKeyInfo(repConfig, 5, 100);
+    try (ECBlockInputStream ecb = new ECBlockInputStream(repConfig,
+        keyInfo, true, null, null, streamFactory)) {
+      ecb.seek(1000);
+    }
+  }
+

Review comment:
       There is a special condition in code about file length and seek pos to be same as 0. It may be a good idea to just add one test to cover that case.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestKeyInputStream.java
##########
@@ -198,6 +220,37 @@ public void testSeekRandomly() throws Exception {
     keyInputStream.close();
   }
 
+  public void testECSeek() throws Exception {
+    int ecChunkSize = 1024 * 1024;
+    ECReplicationConfig repConfig = new ECReplicationConfig(3, 2, RS,
+        ecChunkSize);
+    String keyName = getNewKeyName();
+    // 3 full EC blocks plus one chunk
+    int dataLength = (9 * BLOCK_SIZE + ecChunkSize);
+
+    byte[] inputData = writeRandomBytes(keyName, repConfig, dataLength);
+    KeyInputStream keyInputStream = getKeyInputStream(keyName);

Review comment:
       Please use try for auto closing the stream.

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -309,8 +326,20 @@ public synchronized void unbuffer() {
   }
 
   @Override
-  public synchronized void seek(long l) throws IOException {
-    throw new NotImplementedException("Seek is not implements for EC");
+  public synchronized void seek(long pos) throws IOException {
+    checkOpen();
+    if (pos < 0 || pos >= getLength()) {
+      if (pos == 0) {
+        // It is possible for length and pos to be zero in which case
+        // seek should return instead of throwing exception
+        return;
+      }
+      throw new EOFException(
+          "EOF encountered at pos: " + pos + " for block: "
+              + blockInfo.getBlockID());
+    }
+    position = pos;

Review comment:
       if seeking to the same position, need not set this flag to true right?

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockInputStream.java
##########
@@ -281,6 +282,69 @@ public void testReadCrossingMultipleECChunkBounds() throws IOException {
     }
   }
 
+  @Test(expected = EOFException.class)
+  public void testSeekPastBlockLength() throws IOException {
+    repConfig = new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+        ONEMB);
+    OmKeyLocationInfo keyInfo = createKeyInfo(repConfig, 5, 100);
+    try (ECBlockInputStream ecb = new ECBlockInputStream(repConfig,
+        keyInfo, true, null, null, streamFactory)) {
+      ecb.seek(1000);
+    }
+  }
+

Review comment:
       Please ignore this comment. I saw that test below. Thanks

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       Got it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#issuecomment-942470933


   Latest changes looks good to me. +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org