You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/11/11 17:34:39 UTC

[GitHub] [ozone] sodonnel opened a new pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

sodonnel opened a new pull request #2831:
URL: https://github.com/apache/ozone/pull/2831


   ## What changes were proposed in this pull request?
   
   Adapt ECBlockReconstructedStripeInputStream to catch errors reading from a block and continue reading from other available blocks if there are enough left to reconstruct the stripe.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5951
   
   ## How was this patch tested?
   
   New Unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2831:
URL: https://github.com/apache/ozone/pull/2831#discussion_r749517340



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockReconstructedStripeInputStream.java
##########
@@ -422,12 +467,15 @@ public synchronized boolean hasSufficientLocations() {
     int availableLocations =
         availableDataLocations() + availableParityLocations();
     int paddedLocations = repConfig.getData() - expectedDataBlocks;
+    int failedLocations = failedDataIndexes.size();
 
-    if (availableLocations + paddedLocations >= repConfig.getData()) {
+    if (availableLocations + paddedLocations - failedLocations
+        >= repConfig.getData()) {
       return true;
     } else {

Review comment:
       Yea, probably should be an error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2831:
URL: https://github.com/apache/ozone/pull/2831#discussion_r749516710



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockReconstructedStripeInputStream.java
##########
@@ -354,8 +355,8 @@ public void testErrorThrownIfBlockNotLongEnough() throws IOException {
       try {
         ecb.readStripe(bufs);
         Assert.fail("Read should have thrown an exception");
-      } catch (IOException e) {
-        Assert.assertTrue(e.getMessage().matches("^Expected\\sto\\sread.+"));
+      } catch (InsufficientLocationsException e) {

Review comment:
       Yea, I am aware there is some JUnit magic to do this sort of thing, but find it easier to understand like this and avoid the "magic" annotations




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on pull request #2831:
URL: https://github.com/apache/ozone/pull/2831#issuecomment-969106348


   Thank you @sodonnel  for the patch. Patch looks good to me. Dropped few nits there.
   Looks like there should be at least one more JIRA to track first time failure from ECBlobkInputStream and retry with ECBloblReconstructedStripeInputStream right?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2831:
URL: https://github.com/apache/ozone/pull/2831


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao edited a comment on pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
umamaheswararao edited a comment on pull request #2831:
URL: https://github.com/apache/ozone/pull/2831#issuecomment-969106348


   Thank you @sodonnel  for the patch. Patch looks good to me. Dropped few nits though.
   Looks like there should be at least one more JIRA to track first time failure from ECBlobkInputStream and retry with ECBloblReconstructedStripeInputStream right?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2831:
URL: https://github.com/apache/ozone/pull/2831#discussion_r749519064



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockReconstructedStripeInputStream.java
##########
@@ -210,12 +224,28 @@ public synchronized int readStripe(ByteBuffer[] bufs) throws IOException {
       return EOF;
     }
     validateBuffers(bufs);
-    assignBuffers(bufs);
-    clearParityBuffers();
-    // Set the read limits on the buffers so we do not read any garbage data
-    // from the end of the block that is unexpected.
-    setBufferReadLimits(bufs, toRead);
-    loadDataBuffersFromStream();
+    while(true) {
+      try {
+        assignBuffers(bufs);
+        clearParityBuffers();
+        // Set the read limits on the buffers so we do not read any garbage data
+        // from the end of the block that is unexpected.
+        setBufferReadLimits(bufs, toRead);
+        loadDataBuffersFromStream();
+        break;
+      } catch (IOException e) {
+        // Re-init now the bad block has been excluded. If we have ran out of

Review comment:
       We log the block exception down in `loadDataBuffersFromStream()` so that should be enough. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2831:
URL: https://github.com/apache/ozone/pull/2831#discussion_r749597914



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockReconstructedStripeInputStream.java
##########
@@ -210,12 +224,28 @@ public synchronized int readStripe(ByteBuffer[] bufs) throws IOException {
       return EOF;
     }
     validateBuffers(bufs);
-    assignBuffers(bufs);
-    clearParityBuffers();
-    // Set the read limits on the buffers so we do not read any garbage data
-    // from the end of the block that is unexpected.
-    setBufferReadLimits(bufs, toRead);
-    loadDataBuffersFromStream();
+    while(true) {
+      try {
+        assignBuffers(bufs);
+        clearParityBuffers();
+        // Set the read limits on the buffers so we do not read any garbage data
+        // from the end of the block that is unexpected.
+        setBufferReadLimits(bufs, toRead);
+        loadDataBuffersFromStream();
+        break;
+      } catch (IOException e) {
+        // Re-init now the bad block has been excluded. If we have ran out of

Review comment:
       I see. I think that should be fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2831: HDDS-5951. EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2831:
URL: https://github.com/apache/ozone/pull/2831#discussion_r749498911



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockReconstructedStripeInputStream.java
##########
@@ -422,12 +467,15 @@ public synchronized boolean hasSufficientLocations() {
     int availableLocations =
         availableDataLocations() + availableParityLocations();
     int paddedLocations = repConfig.getData() - expectedDataBlocks;
+    int failedLocations = failedDataIndexes.size();
 
-    if (availableLocations + paddedLocations >= repConfig.getData()) {
+    if (availableLocations + paddedLocations - failedLocations
+        >= repConfig.getData()) {
       return true;
     } else {

Review comment:
       Should this be an error instead of warn? We can't proceed right

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestECBlockReconstructedStripeInputStream.java
##########
@@ -354,8 +355,8 @@ public void testErrorThrownIfBlockNotLongEnough() throws IOException {
       try {
         ecb.readStripe(bufs);
         Assert.fail("Read should have thrown an exception");
-      } catch (IOException e) {
-        Assert.assertTrue(e.getMessage().matches("^Expected\\sto\\sread.+"));
+      } catch (InsufficientLocationsException e) {

Review comment:
       Instead of empty catch, we could also add header annotation in test (expected = InsufficientLocationsException.class). But I am not worried though.

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockReconstructedStripeInputStream.java
##########
@@ -210,12 +224,28 @@ public synchronized int readStripe(ByteBuffer[] bufs) throws IOException {
       return EOF;
     }
     validateBuffers(bufs);
-    assignBuffers(bufs);
-    clearParityBuffers();
-    // Set the read limits on the buffers so we do not read any garbage data
-    // from the end of the block that is unexpected.
-    setBufferReadLimits(bufs, toRead);
-    loadDataBuffersFromStream();
+    while(true) {
+      try {
+        assignBuffers(bufs);
+        clearParityBuffers();
+        // Set the read limits on the buffers so we do not read any garbage data
+        // from the end of the block that is unexpected.
+        setBufferReadLimits(bufs, toRead);
+        loadDataBuffersFromStream();
+        break;
+      } catch (IOException e) {
+        // Re-init now the bad block has been excluded. If we have ran out of

Review comment:
       I think we should log here about the retry?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org