You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/11/03 05:35:30 UTC
[GitHub] [ozone] umamaheswararao commented on a change in pull request #2767: Hdds 5491: EC: Write should handle node failures.

umamaheswararao commented on a change in pull request #2767:
URL: https://github.com/apache/ozone/pull/2767#discussion_r741621874



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ECBlockOutputStream.java
##########
@@ -63,7 +69,65 @@ public ECBlockOutputStream(
 
   @Override
   public void write(byte[] b, int off, int len) throws IOException {
-    writeChunkToContainer(ChunkBuffer.wrap(ByteBuffer.wrap(b, off, len)));
+    this.currentChunkRspFuture =
+        writeChunkToContainer(ChunkBuffer.wrap(ByteBuffer.wrap(b, off, len)));
+  }
+
+  /**
+   * Writes buffered data as a new chunk to the container and saves chunk
+   * information to be used later in putKey call.
+   *
+   * @throws IOException if there is an I/O error while performing the call
+   * @throws OzoneChecksumException if there is an error while computing
+   * checksum
+   * @return ContainerCommandResponseProto
+   */
+  CompletableFuture<ContainerProtos.ContainerCommandResponseProto>
+      writeChunkToContainer(ChunkBuffer chunk) throws IOException {

Review comment:
       I agree, this is a good point. I started with copying that code part and proceeded with that. In fact we can remove this code and move addChunkInfo up into try block there in BOS.

##########
File path: hadoop-ozone/client/src/test/java/org/apache/hadoop/ozone/client/TestOzoneECClient.java
##########
@@ -420,6 +420,80 @@ public void testPartialStripeWithPartialLastChunk()
     }
   }
 
+  @Test
+  public void testWriteShouldFailIfMoreThanParityNodesFail()
+      throws IOException {
+    testNodeFailuresWhileWriting(3, 3);
+  }
+
+  @Test
+  public void testWriteShouldSuccessIfLessThanParityNodesFail()
+      throws IOException {
+    testNodeFailuresWhileWriting(1, 2);
+  }
+
+  @Test
+  public void testWriteShouldSuccessIf4NodesFailed()
+      throws IOException {
+    testNodeFailuresWhileWriting(4, 1);
+  }
+
+  @Test
+  public void testWriteShouldSuccessIfAllNodesFailed()
+      throws IOException {
+    testNodeFailuresWhileWriting(4, 1);
+  }
+
+  public void testNodeFailuresWhileWriting(int numFailureToInject,
+      int numChunksToWriteAfterFailure) throws IOException {
+    store.createVolume(volumeName);
+    OzoneVolume volume = store.getVolume(volumeName);
+    volume.createBucket(bucketName);
+    OzoneBucket bucket = volume.getBucket(bucketName);
+
+    try (OzoneOutputStream out = bucket.createKey(keyName, 1024 * 3,
+        new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+            chunkSize), new HashMap<>())) {
+      for (int i = 0; i < dataBlocks; i++) {
+        out.write(inputChunks[i]);
+      }
+
+      List<DatanodeDetails> failedDNs = new ArrayList<>();
+      Map<DatanodeDetails, MockDatanodeStorage> storages =
+          ((MockXceiverClientFactory) factoryStub).getStorages();
+      DatanodeDetails[] dnDetails =
+          storages.keySet().toArray(new DatanodeDetails[storages.size()]);
+      for (int i = 0; i < numFailureToInject; i++) {
+        failedDNs.add(dnDetails[i]);
+      }
+
+      // First let's set storage as bad
+      ((MockXceiverClientFactory) factoryStub).setFailedStorages(failedDNs);
+
+      for (int i = 0; i < numChunksToWriteAfterFailure; i++) {
+        out.write(inputChunks[i]);
+      }
+    }
+    final OzoneKeyDetails key = bucket.getKey(keyName);
+    Assert.assertEquals(2, key.getOzoneKeyLocations().size());

Review comment:
       Data supposed to store in single block group. Since we introduced the failures after first stripe, the second stripe data should have been written into new blockgroup. So, we should have 2 block groups. That means two keyLocations.
   
   I added this comment to explain.

##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java
##########
@@ -239,7 +293,15 @@ private void handleParityWrites(int parityCellSize,
     // TODO: we should alter the put block calls to share CRC to each stream.
     ECBlockOutputStreamEntry streamEntry =
         blockOutputStreamEntryPool.getCurrentStreamEntry();
+    // Since writes are async, let's check the failures once.
+    if(streamEntry.checkStreamFailures()){

Review comment:
       The problem is we cannot allow the writer asynchronously out of one full stripe. Otherwise we have to cache all the stripe data which could be costly.  Let's say we allow multiple stripes asynchronously and later after writing multiple stripes data, first stripe received failure. If we don't cache all of the stripes we have written so far, we can;t copy back that data to new blockgroup. I think separately we can explore to cache elastically some stripes data and allow asynchronously. Currently the async nature applies to stripe level. I checked HDFS and we are inline with that behavior. Right after parity cells writing, we check for failures.

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ECBlockOutputStream.java
##########
@@ -35,12 +38,15 @@
 import java.util.concurrent.ExecutionException;
 
 import static org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.putBlockAsync;
+import static org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.writeChunkAsync;
 
 /**
  * Handles the chunk EC writes for an EC internal block.
  */
 public class ECBlockOutputStream extends BlockOutputStream{
 
+  private CompletableFuture<ContainerProtos.ContainerCommandResponseProto>
+      currentChunkRspFuture = null;

Review comment:
       Currently in write flow and close, we are already making sure to check the failures just before invoking executePutBlock. For flush, I don't think we are officially tested yet. Need tests for it separately and check. However that future is actually used to validate the response for failures. I quite surprise if write replies exception and subsequent executePutblock reply success. When we test flush and handle, we need to makesure we check for failures before writing parities. But it is very hard to support flush and IIRC, we are not supporting flush for EC.
   From HDFS:
     ```
   @Override
     public void hflush() {
       // not supported yet
       LOG.debug("DFSStripedOutputStream does not support hflush. "
           + "Caller should check StreamCapabilities before calling.");
     }
   ```

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java
##########
@@ -700,6 +714,7 @@ void writeChunkToContainer(ChunkBuffer chunk) throws IOException {
       handleInterruptedException(ex, false);
     }
     containerBlockData.addChunks(chunkInfo);
+    return null;

Review comment:
       This is a good point. I have added that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org