You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/02/25 02:41:24 UTC

[GitHub] [ozone] guihecheng opened a new pull request #3137: HDDS-6373. EC: Exclude pipeline upon container close instead of exclude DNs.

guihecheng opened a new pull request #3137:
URL: https://github.com/apache/ozone/pull/3137


   ## What changes were proposed in this pull request?
   
   Exclude pipeline upon container close instead of exclude DNs.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6373
   
   ## How was this patch tested?
   
   manula tests, see the JIRA above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng commented on a change in pull request #3137: HDDS-6373. EC: Exclude pipeline upon container close instead of exclude DNs.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on a change in pull request #3137:
URL: https://github.com/apache/ozone/pull/3137#discussion_r814741847



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java
##########
@@ -300,36 +305,51 @@ private StripeWriteStatus handleParityWrites(int parityCellSize,
   }
 
   private boolean hasWriteFailure() {
-    List<ECBlockOutputStream> failedStreams =
-        blockOutputStreamEntryPool.getCurrentStreamEntry()
-            .streamsWithWriteFailure();
-    // Since writes are async, let's check the failures once.
-    if (failedStreams.size() > 0) {
-      addToExcludeNodesList(failedStreams);
-      return true;
-    }
-    return false;
+    return !blockOutputStreamEntryPool.getCurrentStreamEntry()
+        .streamsWithWriteFailure().isEmpty();
   }
 
   private boolean hasPutBlockFailure() {
-    List<ECBlockOutputStream> failedStreams =
-        blockOutputStreamEntryPool.getCurrentStreamEntry()
-            .streamsWithPutBlockFailure();
-    // Since writes are async, let's check the failures once.
-    if (failedStreams.size() > 0) {
-      addToExcludeNodesList(failedStreams);
-      return true;
-    }
-    return false;
+    return !blockOutputStreamEntryPool.getCurrentStreamEntry()
+        .streamsWithPutBlockFailure().isEmpty();
   }
 
-  private void addToExcludeNodesList(List<ECBlockOutputStream> failedStreams) {
+  private void handleFailedStreams(boolean forPutBlock) {
+    ECBlockOutputStreamEntry currentStreamEntry =
+        blockOutputStreamEntryPool.getCurrentStreamEntry();
+    List<ECBlockOutputStream> failedStreams = forPutBlock
+        ? currentStreamEntry.streamsWithPutBlockFailure()
+        : currentStreamEntry.streamsWithWriteFailure();
+
+    // Since writes are async, let's check the failures once.
+    boolean containerToExcludeAll = true;
     for (ECBlockOutputStream failedStream : failedStreams) {
+      Throwable cause = HddsClientUtils.checkForException(
+          failedStream.getIoException());
+      Preconditions.checkNotNull(cause);
+      if (!checkIfContainerToExclude(cause)) {
+        blockOutputStreamEntryPool.getExcludeList()
+            .addDatanode(failedStream.getDatanodeDetails());
+        containerToExcludeAll = false;
+      }
+    }
+
+    // NOTE: For now, this is mainly for ContainerNotOpenException
+    // due to container full, but may also for those cases that
+    // a DN do respond but with one with certain failures.
+    // In such cases we don't treat the replied DNs as failed.
+    if (containerToExcludeAll) {
       blockOutputStreamEntryPool.getExcludeList()
-          .addDatanode(failedStream.getDatanodeDetails());
+          .addPipeline(currentStreamEntry.getPipeline().getId());
     }
   }
 
+  @Override
+  protected boolean checkIfContainerToExclude(Throwable t) {
+    return super.checkIfContainerToExclude(t)
+        && t instanceof ContainerNotOpenException;

Review comment:
       Thanks for your comments Stephen, the thought comes into my mind too, we might could just drop the override method and stay consistent with Ratis path. But just as you mentioned there could be other StorageContainerExceptions returned from a DN, so here the problem I want to address is the container close case, so I just check the specific ContainerNotOpenException and for other exceptions I just try to leave them in the original handling way -- just mark it as failed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #3137: HDDS-6373. EC: Exclude pipeline upon container close instead of exclude DNs.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #3137:
URL: https://github.com/apache/ozone/pull/3137#discussion_r816079296



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java
##########
@@ -300,36 +305,51 @@ private StripeWriteStatus handleParityWrites(int parityCellSize,
   }
 
   private boolean hasWriteFailure() {
-    List<ECBlockOutputStream> failedStreams =
-        blockOutputStreamEntryPool.getCurrentStreamEntry()
-            .streamsWithWriteFailure();
-    // Since writes are async, let's check the failures once.
-    if (failedStreams.size() > 0) {
-      addToExcludeNodesList(failedStreams);
-      return true;
-    }
-    return false;
+    return !blockOutputStreamEntryPool.getCurrentStreamEntry()
+        .streamsWithWriteFailure().isEmpty();
   }
 
   private boolean hasPutBlockFailure() {
-    List<ECBlockOutputStream> failedStreams =
-        blockOutputStreamEntryPool.getCurrentStreamEntry()
-            .streamsWithPutBlockFailure();
-    // Since writes are async, let's check the failures once.
-    if (failedStreams.size() > 0) {
-      addToExcludeNodesList(failedStreams);
-      return true;
-    }
-    return false;
+    return !blockOutputStreamEntryPool.getCurrentStreamEntry()
+        .streamsWithPutBlockFailure().isEmpty();
   }
 
-  private void addToExcludeNodesList(List<ECBlockOutputStream> failedStreams) {
+  private void handleFailedStreams(boolean forPutBlock) {
+    ECBlockOutputStreamEntry currentStreamEntry =
+        blockOutputStreamEntryPool.getCurrentStreamEntry();
+    List<ECBlockOutputStream> failedStreams = forPutBlock
+        ? currentStreamEntry.streamsWithPutBlockFailure()
+        : currentStreamEntry.streamsWithWriteFailure();
+
+    // Since writes are async, let's check the failures once.
+    boolean containerToExcludeAll = true;
     for (ECBlockOutputStream failedStream : failedStreams) {
+      Throwable cause = HddsClientUtils.checkForException(
+          failedStream.getIoException());
+      Preconditions.checkNotNull(cause);
+      if (!checkIfContainerToExclude(cause)) {
+        blockOutputStreamEntryPool.getExcludeList()
+            .addDatanode(failedStream.getDatanodeDetails());
+        containerToExcludeAll = false;
+      }
+    }
+
+    // NOTE: For now, this is mainly for ContainerNotOpenException
+    // due to container full, but may also for those cases that
+    // a DN do respond but with one with certain failures.
+    // In such cases we don't treat the replied DNs as failed.
+    if (containerToExcludeAll) {
       blockOutputStreamEntryPool.getExcludeList()
-          .addDatanode(failedStream.getDatanodeDetails());
+          .addPipeline(currentStreamEntry.getPipeline().getId());
     }
   }
 
+  @Override
+  protected boolean checkIfContainerToExclude(Throwable t) {
+    return super.checkIfContainerToExclude(t)
+        && t instanceof ContainerNotOpenException;

Review comment:
       OK - it makes sense. We can start with just handing ContainerNotOpenException and then see what else comes up over time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao merged pull request #3137: HDDS-6373. EC: Exclude pipeline upon container close instead of exclude DNs.

Posted by GitBox <gi...@apache.org>.
umamaheswararao merged pull request #3137:
URL: https://github.com/apache/ozone/pull/3137


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #3137: HDDS-6373. EC: Exclude pipeline upon container close instead of exclude DNs.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #3137:
URL: https://github.com/apache/ozone/pull/3137#discussion_r814650748



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java
##########
@@ -300,36 +305,51 @@ private StripeWriteStatus handleParityWrites(int parityCellSize,
   }
 
   private boolean hasWriteFailure() {
-    List<ECBlockOutputStream> failedStreams =
-        blockOutputStreamEntryPool.getCurrentStreamEntry()
-            .streamsWithWriteFailure();
-    // Since writes are async, let's check the failures once.
-    if (failedStreams.size() > 0) {
-      addToExcludeNodesList(failedStreams);
-      return true;
-    }
-    return false;
+    return !blockOutputStreamEntryPool.getCurrentStreamEntry()
+        .streamsWithWriteFailure().isEmpty();
   }
 
   private boolean hasPutBlockFailure() {
-    List<ECBlockOutputStream> failedStreams =
-        blockOutputStreamEntryPool.getCurrentStreamEntry()
-            .streamsWithPutBlockFailure();
-    // Since writes are async, let's check the failures once.
-    if (failedStreams.size() > 0) {
-      addToExcludeNodesList(failedStreams);
-      return true;
-    }
-    return false;
+    return !blockOutputStreamEntryPool.getCurrentStreamEntry()
+        .streamsWithPutBlockFailure().isEmpty();
   }
 
-  private void addToExcludeNodesList(List<ECBlockOutputStream> failedStreams) {
+  private void handleFailedStreams(boolean forPutBlock) {
+    ECBlockOutputStreamEntry currentStreamEntry =
+        blockOutputStreamEntryPool.getCurrentStreamEntry();
+    List<ECBlockOutputStream> failedStreams = forPutBlock
+        ? currentStreamEntry.streamsWithPutBlockFailure()
+        : currentStreamEntry.streamsWithWriteFailure();
+
+    // Since writes are async, let's check the failures once.
+    boolean containerToExcludeAll = true;
     for (ECBlockOutputStream failedStream : failedStreams) {
+      Throwable cause = HddsClientUtils.checkForException(
+          failedStream.getIoException());
+      Preconditions.checkNotNull(cause);
+      if (!checkIfContainerToExclude(cause)) {
+        blockOutputStreamEntryPool.getExcludeList()
+            .addDatanode(failedStream.getDatanodeDetails());
+        containerToExcludeAll = false;
+      }
+    }
+
+    // NOTE: For now, this is mainly for ContainerNotOpenException
+    // due to container full, but may also for those cases that
+    // a DN do respond but with one with certain failures.
+    // In such cases we don't treat the replied DNs as failed.
+    if (containerToExcludeAll) {
       blockOutputStreamEntryPool.getExcludeList()
-          .addDatanode(failedStream.getDatanodeDetails());
+          .addPipeline(currentStreamEntry.getPipeline().getId());
     }
   }
 
+  @Override
+  protected boolean checkIfContainerToExclude(Throwable t) {
+    return super.checkIfContainerToExclude(t)
+        && t instanceof ContainerNotOpenException;

Review comment:
       I don't know what the best approach is:
   
   It seems that in the existing Ratis code, there is already this checkIfContainerToExclude method, and it excludes the container rather than the DN if there is any exception returned from the DN. If the DN is able to return an exception, then it is still up. However I guess it could somehow be in a state where everything on it is failing (I don't know how that may happen).
   
   The existing code makes me think that we many not need the ContainerNotOpenException check here, and just use the existing Super method to handle things in the same way as for Ratis writes? That way  it is consistent and we will handle other potential errors from the DN too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3137: HDDS-6373. EC: Exclude pipeline upon container close instead of exclude DNs.

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #3137:
URL: https://github.com/apache/ozone/pull/3137#issuecomment-1050729569


   Thanks for looking into this issue. Myself and Uma discussed something related to this recently, but we did not get time to address it yet. The change looks mostly good to me - I just had one comment / question.
   
   I would like @umamaheswararao to review it too, as he knows the write code path better than I do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org