You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/07/16 04:01:17 UTC

[GitHub] [ozone] ChenSammi opened a new pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

ChenSammi opened a new pull request #2420:
URL: https://github.com/apache/ozone/pull/2420


   https://issues.apache.org/jira/browse/HDDS-5360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r676264678



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -76,34 +84,179 @@
   private final ConfigurationSource conf;
   private int invocationCount;
   private long totalTime;
-  private boolean cmdExecuted;
+  private final ExecutorService executor;
+  private final LinkedBlockingQueue<DeleteCmdInfo> deleteCommandQueues;
+  private final Daemon handlerThread;
 
   public DeleteBlocksCommandHandler(ContainerSet cset,
-      ConfigurationSource conf) {
+      ConfigurationSource conf, int threadPoolSize, int queueLimit) {
     this.containerSet = cset;
     this.conf = conf;
+    this.executor = new ThreadPoolExecutor(
+        0, threadPoolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("DeleteBlocksCommandHandlerThread-%d")
+            .build());
+    this.deleteCommandQueues = new LinkedBlockingQueue<>(queueLimit);
+    handlerThread = new Daemon(new DeleteCmdWorker());
+    handlerThread.start();
   }
 
   @Override
   public void handle(SCMCommand command, OzoneContainer container,
       StateContext context, SCMConnectionManager connectionManager) {
-    cmdExecuted = false;
-    long startTime = Time.monotonicNow();
-    ContainerBlocksDeletionACKProto blockDeletionACK = null;
+    if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
+      LOG.warn("Skipping handling command, expected command "
+              + "type {} but found {}",
+          SCMCommandProto.Type.deleteBlocksCommand, command.getType());
+      return;
+    }
+
     try {
-      if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
-        LOG.warn("Skipping handling command, expected command "
-                + "type {} but found {}",
-            SCMCommandProto.Type.deleteBlocksCommand, command.getType());
-        return;
+      DeleteCmdInfo cmd = new DeleteCmdInfo((DeleteBlocksCommand) command,
+          container, context, connectionManager);
+      deleteCommandQueues.add(cmd);
+    } catch (IllegalStateException e) {
+      LOG.warn("Command is discarded because of the command queue is full");
+      return;
+    }
+  }
+
+  /**
+   * A delete command info.
+   */
+  public static final class DeleteCmdInfo {
+    private DeleteBlocksCommand cmd;
+    private StateContext context;
+    private OzoneContainer container;
+    private SCMConnectionManager connectionManager;
+
+    public DeleteCmdInfo(DeleteBlocksCommand command, OzoneContainer container,
+        StateContext context, SCMConnectionManager connectionManager) {
+      this.cmd = command;
+      this.context = context;
+      this.container = container;
+      this.connectionManager = connectionManager;
+    }
+    public DeleteBlocksCommand getCmd() {
+      return this.cmd;
+    }
+    public StateContext getContext() {
+      return this.context;
+    }
+    public OzoneContainer getContainer() {
+      return this.container;
+    }
+    public SCMConnectionManager getConnectionManager() {
+      return this.connectionManager;
+    }
+  }
+
+  /**
+   * Process delete commands.
+   */
+  public final class DeleteCmdWorker implements Runnable {
+
+    @Override
+    public void run() {
+      while (true) {
+        while (!deleteCommandQueues.isEmpty()) {
+          DeleteCmdInfo cmd = deleteCommandQueues.poll();
+          try {
+            processCmd(cmd.getCmd(), cmd.getContainer(), cmd.getContext(),

Review comment:
       thanks for the work ! NIT, can we use `DeleteCmdInfo` as the only one parameter of `processCmd`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi edited a comment on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-881275959


   ![图片](https://user-images.githubusercontent.com/19790142/125918077-fe99f6ab-66af-4e35-b83d-2629555d536b.png)
   
   This is the rocksDB cache metrics data captured from one DN.  It costs about 200ms for open a rocksdb to persist the block deletion trasaction command.  Becased on current default configuration,  one DN can handle at most 60s / 200ms = 300 rocksdb opens if cache missed in 60s which is SCM block deleting command generation interval. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r672938152



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -76,34 +84,177 @@
   private final ConfigurationSource conf;
   private int invocationCount;
   private long totalTime;
-  private boolean cmdExecuted;
+  private final ExecutorService executor;
+  private final LinkedBlockingQueue<DeleteCmdInfo> deleteCommandQueues;
+  private final Daemon handlerThread;
 
   public DeleteBlocksCommandHandler(ContainerSet cset,
-      ConfigurationSource conf) {
+      ConfigurationSource conf, int threadPoolSize, int queueLimit) {
     this.containerSet = cset;
     this.conf = conf;
+    this.executor = new ThreadPoolExecutor(
+        0, threadPoolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("DeleteBlocksCommandHandlerThread-%d")
+            .build());
+    this.deleteCommandQueues = new LinkedBlockingQueue<>(queueLimit);
+    handlerThread = new Daemon(new DeleteCmdWorker());
+    handlerThread.start();
   }
 
   @Override
   public void handle(SCMCommand command, OzoneContainer container,
       StateContext context, SCMConnectionManager connectionManager) {
-    cmdExecuted = false;
-    long startTime = Time.monotonicNow();
-    ContainerBlocksDeletionACKProto blockDeletionACK = null;
+    if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
+      LOG.warn("Skipping handling command, expected command "
+              + "type {} but found {}",
+          SCMCommandProto.Type.deleteBlocksCommand, command.getType());
+      return;
+    }
+
     try {
-      if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
-        LOG.warn("Skipping handling command, expected command "
-                + "type {} but found {}",
-            SCMCommandProto.Type.deleteBlocksCommand, command.getType());
-        return;
+      DeleteCmdInfo cmd = new DeleteCmdInfo((DeleteBlocksCommand) command,
+          container, context, connectionManager);
+      deleteCommandQueues.add(cmd);
+    } catch (Exception e) {
+      LOG.warn("Command is discarded because of the command queue is full");

Review comment:
       is this right? Looking at AbstractQueue, the add() can throw these exceptions:
   
   IllegalStateException - if the element cannot be added at this time due to capacity restrictions
   ClassCastException - if the class of the specified element prevents it from being added to this queue
   NullPointerException - if the specified element is null and this queue does not permit null elements
   IllegalArgumentException - if some property of this element prevents it from being added to this queue




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r686171070



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/TestContainerReplication.java
##########
@@ -134,7 +134,6 @@ public void testContainerReplication() throws Exception {
 
     cluster.shutdownHddsDatanode(keyLocation.getPipeline().getFirstNode());
 
-    waitForReplicaCount(containerID, 2, cluster);

Review comment:
       Why was this removed?  Without this, the test may give false negative: replica count may be 3 both initially and after re-replication, and we need to distinguish between the two.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi edited a comment on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883205178


   @jojochuang , do you mean 1-db-per-volume design?  We think the same way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883205178


   > container
   
   Do you mean 1-db-per-volume design?  We think the same way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r686782571



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/TestContainerReplication.java
##########
@@ -134,7 +134,6 @@ public void testContainerReplication() throws Exception {
 
     cluster.shutdownHddsDatanode(keyLocation.getPipeline().getFirstNode());
 
-    waitForReplicaCount(containerID, 2, cluster);

Review comment:
       Because delete thread pool shutdown takes time, when it finishes, the replication manager already have the container replicated. So the wait for 2 replicas randomly fails because of there are already 3 replicas. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi edited a comment on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883205178


   @jojochuang , do you mean 1-db-per-volume design?  We think the same way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi edited a comment on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883205178


   @jojochuang , do you mean 1-db-per-volume design?  We think the same way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r673598456



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -76,34 +84,178 @@
   private final ConfigurationSource conf;
   private int invocationCount;
   private long totalTime;
-  private boolean cmdExecuted;
+  private final ExecutorService executor;
+  private final LinkedBlockingQueue<DeleteCmdInfo> deleteCommandQueues;
+  private final Daemon handlerThread;
 
   public DeleteBlocksCommandHandler(ContainerSet cset,
-      ConfigurationSource conf) {
+      ConfigurationSource conf, int threadPoolSize, int queueLimit) {
     this.containerSet = cset;
     this.conf = conf;
+    this.executor = new ThreadPoolExecutor(
+        0, threadPoolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("DeleteBlocksCommandHandlerThread-%d")
+            .build());
+    this.deleteCommandQueues = new LinkedBlockingQueue<>(queueLimit);
+    handlerThread = new Daemon(new DeleteCmdWorker());
+    handlerThread.start();
   }
 
   @Override
   public void handle(SCMCommand command, OzoneContainer container,
       StateContext context, SCMConnectionManager connectionManager) {
-    cmdExecuted = false;
-    long startTime = Time.monotonicNow();
-    ContainerBlocksDeletionACKProto blockDeletionACK = null;
+    if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
+      LOG.warn("Skipping handling command, expected command "
+              + "type {} but found {}",
+          SCMCommandProto.Type.deleteBlocksCommand, command.getType());
+      return;
+    }
+
     try {
-      if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
-        LOG.warn("Skipping handling command, expected command "
-                + "type {} but found {}",
-            SCMCommandProto.Type.deleteBlocksCommand, command.getType());
-        return;
+      DeleteCmdInfo cmd = new DeleteCmdInfo((DeleteBlocksCommand) command,
+          container, context, connectionManager);
+      deleteCommandQueues.add(cmd);
+    } catch (IllegalStateException e) {
+      LOG.warn("Command is discarded because of the command queue is full");
+      return;
+    }
+  }
+
+  /**
+   * A delete command info.
+   */
+  public static final class DeleteCmdInfo {
+    private DeleteBlocksCommand cmd;
+    private StateContext context;
+    private OzoneContainer container;
+    private SCMConnectionManager connectionManager;
+
+    public DeleteCmdInfo(DeleteBlocksCommand command, OzoneContainer container,
+        StateContext context, SCMConnectionManager connectionManager) {
+      this.cmd = command;
+      this.context = context;
+      this.container = container;
+      this.connectionManager = connectionManager;
+    }
+    public DeleteBlocksCommand getCmd() {
+      return this.cmd;
+    }
+    public StateContext getContext() {
+      return this.context;
+    }
+    public OzoneContainer getContainer() {
+      return this.container;
+    }
+    public SCMConnectionManager getConnectionManager() {
+      return this.connectionManager;
+    }
+  }
+
+  /**
+   * Process delete commands.
+   */
+  public final class DeleteCmdWorker implements Runnable {
+
+    @Override
+    public void run() {
+      while (true) {
+        while (!deleteCommandQueues.isEmpty()) {
+          DeleteCmdInfo cmd = deleteCommandQueues.poll();
+          try {
+            processCmd(cmd.getCmd(), cmd.getContainer(), cmd.getContext(),
+                cmd.getConnectionManager());
+          } catch (Throwable e) {
+            LOG.error("taskProcess failed.", e);
+          }
+        }
+
+        try {
+          Thread.sleep(2000);
+        } catch (InterruptedException e) {

Review comment:
       Oh one more thing. We should do
   `Thread.currentThread().interrupt();`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883205178


   > container
   
   Do you mean 1-db-per-volume design?  We think the same way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r671062474



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -115,56 +201,71 @@ public void handle(SCMCommand command, OzoneContainer container,
 
       ContainerBlocksDeletionACKProto.Builder resultBuilder =
           ContainerBlocksDeletionACKProto.newBuilder();
-      containerBlocks.forEach(entry -> {
-        DeleteBlockTransactionResult.Builder txResultBuilder =
-            DeleteBlockTransactionResult.newBuilder();
-        txResultBuilder.setTxID(entry.getTxID());
-        long containerId = entry.getContainerID();
-        int newDeletionBlocks = 0;
-        try {
-          Container cont = containerSet.getContainer(containerId);
-          if (cont == null) {
-            throw new StorageContainerException("Unable to find the container "
-                + containerId, CONTAINER_NOT_FOUND);
-          }
-          ContainerProtos.ContainerType containerType = cont.getContainerType();
-          switch (containerType) {
-          case KeyValueContainer:
-            KeyValueContainerData containerData = (KeyValueContainerData)
-                cont.getContainerData();
-            cont.writeLock();
-            try {
-              if (containerData.getSchemaVersion().equals(SCHEMA_V1)) {
-                markBlocksForDeletionSchemaV1(containerData, entry);
-              } else if (containerData.getSchemaVersion().equals(SCHEMA_V2)) {
-                markBlocksForDeletionSchemaV2(containerData, entry,
-                    newDeletionBlocks, entry.getTxID());
-              } else {
-                throw new UnsupportedOperationException(
-                    "Only schema version 1 and schema version 2 are "
-                        + "supported.");
+      List<Future> futures = new ArrayList<>();
+      for (int i = 0; i < containerBlocks.size(); i++) {
+        DeletedBlocksTransaction tx = containerBlocks.get(i);
+        Future future = executor.submit(() -> {

Review comment:
       this is hard to read. Can you move out this Callable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-896840505


   > Sure. I will take a look of timeout issue.
   
   Thanks.  Filed HDDS-5605 for it.  HDDS-5606 is also related, but it started happening earlier (few weeks ago).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-895766573


   Thanks @jojochuang and @JacksonYao287  for the code review.  Also thanks @adoroszlai for taking care of the "no space left" issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r672938152



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -76,34 +84,177 @@
   private final ConfigurationSource conf;
   private int invocationCount;
   private long totalTime;
-  private boolean cmdExecuted;
+  private final ExecutorService executor;
+  private final LinkedBlockingQueue<DeleteCmdInfo> deleteCommandQueues;
+  private final Daemon handlerThread;
 
   public DeleteBlocksCommandHandler(ContainerSet cset,
-      ConfigurationSource conf) {
+      ConfigurationSource conf, int threadPoolSize, int queueLimit) {
     this.containerSet = cset;
     this.conf = conf;
+    this.executor = new ThreadPoolExecutor(
+        0, threadPoolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("DeleteBlocksCommandHandlerThread-%d")
+            .build());
+    this.deleteCommandQueues = new LinkedBlockingQueue<>(queueLimit);
+    handlerThread = new Daemon(new DeleteCmdWorker());
+    handlerThread.start();
   }
 
   @Override
   public void handle(SCMCommand command, OzoneContainer container,
       StateContext context, SCMConnectionManager connectionManager) {
-    cmdExecuted = false;
-    long startTime = Time.monotonicNow();
-    ContainerBlocksDeletionACKProto blockDeletionACK = null;
+    if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
+      LOG.warn("Skipping handling command, expected command "
+              + "type {} but found {}",
+          SCMCommandProto.Type.deleteBlocksCommand, command.getType());
+      return;
+    }
+
     try {
-      if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
-        LOG.warn("Skipping handling command, expected command "
-                + "type {} but found {}",
-            SCMCommandProto.Type.deleteBlocksCommand, command.getType());
-        return;
+      DeleteCmdInfo cmd = new DeleteCmdInfo((DeleteBlocksCommand) command,
+          container, context, connectionManager);
+      deleteCommandQueues.add(cmd);
+    } catch (Exception e) {
+      LOG.warn("Command is discarded because of the command queue is full");

Review comment:
       is this right? Looking at AbstractQueue, the add() can throw these exceptions:
   
   IllegalStateException - if the element cannot be added at this time due to capacity restrictions
   ClassCastException - if the class of the specified element prevents it from being added to this queue
   NullPointerException - if the specified element is null and this queue does not permit null elements
   IllegalArgumentException - if some property of this element prevents it from being added to this queue




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883205178


   > container
   
   Do you mean 1-db-per-volume design?  We think the same way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883167625


   > ![图片](https://user-images.githubusercontent.com/19790142/125918077-fe99f6ab-66af-4e35-b83d-2629555d536b.png)
   > 
   > This is the rocksDB cache metrics data captured from one DN. It costs about 200ms for open a rocksdb to persist the block deletion trasaction command. Becased on current default configuration, one DN can handle at most 60s / 200ms = 300 rocksdb opens if cache missed in 60s which is SCM block deleting command generation interval.
   
   This is yet another reason why we should reconsider the 1-db-per-container design :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883167625


   > ![图片](https://user-images.githubusercontent.com/19790142/125918077-fe99f6ab-66af-4e35-b83d-2629555d536b.png)
   > 
   > This is the rocksDB cache metrics data captured from one DN. It costs about 200ms for open a rocksdb to persist the block deletion trasaction command. Becased on current default configuration, one DN can handle at most 60s / 200ms = 300 rocksdb opens if cache missed in 60s which is SCM block deleting command generation interval.
   
   This is yet another reason why we should reconsider the 1-db-per-container design :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r672849623



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -115,56 +201,71 @@ public void handle(SCMCommand command, OzoneContainer container,
 
       ContainerBlocksDeletionACKProto.Builder resultBuilder =
           ContainerBlocksDeletionACKProto.newBuilder();
-      containerBlocks.forEach(entry -> {
-        DeleteBlockTransactionResult.Builder txResultBuilder =
-            DeleteBlockTransactionResult.newBuilder();
-        txResultBuilder.setTxID(entry.getTxID());
-        long containerId = entry.getContainerID();
-        int newDeletionBlocks = 0;
-        try {
-          Container cont = containerSet.getContainer(containerId);
-          if (cont == null) {
-            throw new StorageContainerException("Unable to find the container "
-                + containerId, CONTAINER_NOT_FOUND);
-          }
-          ContainerProtos.ContainerType containerType = cont.getContainerType();
-          switch (containerType) {
-          case KeyValueContainer:
-            KeyValueContainerData containerData = (KeyValueContainerData)
-                cont.getContainerData();
-            cont.writeLock();
-            try {
-              if (containerData.getSchemaVersion().equals(SCHEMA_V1)) {
-                markBlocksForDeletionSchemaV1(containerData, entry);
-              } else if (containerData.getSchemaVersion().equals(SCHEMA_V2)) {
-                markBlocksForDeletionSchemaV2(containerData, entry,
-                    newDeletionBlocks, entry.getTxID());
-              } else {
-                throw new UnsupportedOperationException(
-                    "Only schema version 1 and schema version 2 are "
-                        + "supported.");
+      List<Future> futures = new ArrayList<>();
+      for (int i = 0; i < containerBlocks.size(); i++) {
+        DeletedBlocksTransaction tx = containerBlocks.get(i);
+        Future future = executor.submit(() -> {

Review comment:
       Sure.  I refactored the code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r672849623



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -115,56 +201,71 @@ public void handle(SCMCommand command, OzoneContainer container,
 
       ContainerBlocksDeletionACKProto.Builder resultBuilder =
           ContainerBlocksDeletionACKProto.newBuilder();
-      containerBlocks.forEach(entry -> {
-        DeleteBlockTransactionResult.Builder txResultBuilder =
-            DeleteBlockTransactionResult.newBuilder();
-        txResultBuilder.setTxID(entry.getTxID());
-        long containerId = entry.getContainerID();
-        int newDeletionBlocks = 0;
-        try {
-          Container cont = containerSet.getContainer(containerId);
-          if (cont == null) {
-            throw new StorageContainerException("Unable to find the container "
-                + containerId, CONTAINER_NOT_FOUND);
-          }
-          ContainerProtos.ContainerType containerType = cont.getContainerType();
-          switch (containerType) {
-          case KeyValueContainer:
-            KeyValueContainerData containerData = (KeyValueContainerData)
-                cont.getContainerData();
-            cont.writeLock();
-            try {
-              if (containerData.getSchemaVersion().equals(SCHEMA_V1)) {
-                markBlocksForDeletionSchemaV1(containerData, entry);
-              } else if (containerData.getSchemaVersion().equals(SCHEMA_V2)) {
-                markBlocksForDeletionSchemaV2(containerData, entry,
-                    newDeletionBlocks, entry.getTxID());
-              } else {
-                throw new UnsupportedOperationException(
-                    "Only schema version 1 and schema version 2 are "
-                        + "supported.");
+      List<Future> futures = new ArrayList<>();
+      for (int i = 0; i < containerBlocks.size(); i++) {
+        DeletedBlocksTransaction tx = containerBlocks.get(i);
+        Future future = executor.submit(() -> {

Review comment:
       Sure.  I refactored the code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r686171070



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/TestContainerReplication.java
##########
@@ -134,7 +134,6 @@ public void testContainerReplication() throws Exception {
 
     cluster.shutdownHddsDatanode(keyLocation.getPipeline().getFirstNode());
 
-    waitForReplicaCount(containerID, 2, cluster);

Review comment:
       Why was this removed?  Without this, the test may give false negative: replicate count may be 3 both initially and after re-replication, and we need to distinguish between the two.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883843525


   Hi @adoroszlai ,  do we need to do something to free disk space if "Warning:  ForkStarter IOException: java.io.IOException: No space left on device" is found? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r672938152



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -76,34 +84,177 @@
   private final ConfigurationSource conf;
   private int invocationCount;
   private long totalTime;
-  private boolean cmdExecuted;
+  private final ExecutorService executor;
+  private final LinkedBlockingQueue<DeleteCmdInfo> deleteCommandQueues;
+  private final Daemon handlerThread;
 
   public DeleteBlocksCommandHandler(ContainerSet cset,
-      ConfigurationSource conf) {
+      ConfigurationSource conf, int threadPoolSize, int queueLimit) {
     this.containerSet = cset;
     this.conf = conf;
+    this.executor = new ThreadPoolExecutor(
+        0, threadPoolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("DeleteBlocksCommandHandlerThread-%d")
+            .build());
+    this.deleteCommandQueues = new LinkedBlockingQueue<>(queueLimit);
+    handlerThread = new Daemon(new DeleteCmdWorker());
+    handlerThread.start();
   }
 
   @Override
   public void handle(SCMCommand command, OzoneContainer container,
       StateContext context, SCMConnectionManager connectionManager) {
-    cmdExecuted = false;
-    long startTime = Time.monotonicNow();
-    ContainerBlocksDeletionACKProto blockDeletionACK = null;
+    if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
+      LOG.warn("Skipping handling command, expected command "
+              + "type {} but found {}",
+          SCMCommandProto.Type.deleteBlocksCommand, command.getType());
+      return;
+    }
+
     try {
-      if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
-        LOG.warn("Skipping handling command, expected command "
-                + "type {} but found {}",
-            SCMCommandProto.Type.deleteBlocksCommand, command.getType());
-        return;
+      DeleteCmdInfo cmd = new DeleteCmdInfo((DeleteBlocksCommand) command,
+          container, context, connectionManager);
+      deleteCommandQueues.add(cmd);
+    } catch (Exception e) {
+      LOG.warn("Command is discarded because of the command queue is full");

Review comment:
       is this right? Looking at AbstractQueue, the add() can throw these exceptions:
   
   IllegalStateException - if the element cannot be added at this time due to capacity restrictions
   ClassCastException - if the class of the specified element prevents it from being added to this queue
   NullPointerException - if the specified element is null and this queue does not permit null elements
   IllegalArgumentException - if some property of this element prevents it from being added to this queue




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
jojochuang commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-883167625


   > ![图片](https://user-images.githubusercontent.com/19790142/125918077-fe99f6ab-66af-4e35-b83d-2629555d536b.png)
   > 
   > This is the rocksDB cache metrics data captured from one DN. It costs about 200ms for open a rocksdb to persist the block deletion trasaction command. Becased on current default configuration, one DN can handle at most 60s / 200ms = 300 rocksdb opens if cache missed in 60s which is SCM block deleting command generation interval.
   
   This is yet another reason why we should reconsider the 1-db-per-container design :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r685169173



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -76,34 +84,179 @@
   private final ConfigurationSource conf;
   private int invocationCount;
   private long totalTime;
-  private boolean cmdExecuted;
+  private final ExecutorService executor;
+  private final LinkedBlockingQueue<DeleteCmdInfo> deleteCommandQueues;
+  private final Daemon handlerThread;
 
   public DeleteBlocksCommandHandler(ContainerSet cset,
-      ConfigurationSource conf) {
+      ConfigurationSource conf, int threadPoolSize, int queueLimit) {
     this.containerSet = cset;
     this.conf = conf;
+    this.executor = new ThreadPoolExecutor(
+        0, threadPoolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("DeleteBlocksCommandHandlerThread-%d")
+            .build());
+    this.deleteCommandQueues = new LinkedBlockingQueue<>(queueLimit);
+    handlerThread = new Daemon(new DeleteCmdWorker());
+    handlerThread.start();
   }
 
   @Override
   public void handle(SCMCommand command, OzoneContainer container,
       StateContext context, SCMConnectionManager connectionManager) {
-    cmdExecuted = false;
-    long startTime = Time.monotonicNow();
-    ContainerBlocksDeletionACKProto blockDeletionACK = null;
+    if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
+      LOG.warn("Skipping handling command, expected command "
+              + "type {} but found {}",
+          SCMCommandProto.Type.deleteBlocksCommand, command.getType());
+      return;
+    }
+
     try {
-      if (command.getType() != SCMCommandProto.Type.deleteBlocksCommand) {
-        LOG.warn("Skipping handling command, expected command "
-                + "type {} but found {}",
-            SCMCommandProto.Type.deleteBlocksCommand, command.getType());
-        return;
+      DeleteCmdInfo cmd = new DeleteCmdInfo((DeleteBlocksCommand) command,
+          container, context, connectionManager);
+      deleteCommandQueues.add(cmd);
+    } catch (IllegalStateException e) {
+      LOG.warn("Command is discarded because of the command queue is full");
+      return;
+    }
+  }
+
+  /**
+   * A delete command info.
+   */
+  public static final class DeleteCmdInfo {
+    private DeleteBlocksCommand cmd;
+    private StateContext context;
+    private OzoneContainer container;
+    private SCMConnectionManager connectionManager;
+
+    public DeleteCmdInfo(DeleteBlocksCommand command, OzoneContainer container,
+        StateContext context, SCMConnectionManager connectionManager) {
+      this.cmd = command;
+      this.context = context;
+      this.container = container;
+      this.connectionManager = connectionManager;
+    }
+    public DeleteBlocksCommand getCmd() {
+      return this.cmd;
+    }
+    public StateContext getContext() {
+      return this.context;
+    }
+    public OzoneContainer getContainer() {
+      return this.container;
+    }
+    public SCMConnectionManager getConnectionManager() {
+      return this.connectionManager;
+    }
+  }
+
+  /**
+   * Process delete commands.
+   */
+  public final class DeleteCmdWorker implements Runnable {
+
+    @Override
+    public void run() {
+      while (true) {
+        while (!deleteCommandQueues.isEmpty()) {
+          DeleteCmdInfo cmd = deleteCommandQueues.poll();
+          try {
+            processCmd(cmd.getCmd(), cmd.getContainer(), cmd.getContext(),

Review comment:
       Sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi merged pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi merged pull request #2420:
URL: https://github.com/apache/ozone/pull/2420


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-896167864


   `TestBlockDeletion` timed out twice on `master` since this commit.
   
   * https://github.com/elek/ozone-build-results/blob/master/2021/08/10/9545/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion.txt
   * https://github.com/elek/ozone-build-results/blob/master/2021/08/10/9555/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion.txt
   
   I think this should be fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-881275959


   ![图片](https://user-images.githubusercontent.com/19790142/125918077-fe99f6ab-66af-4e35-b83d-2629555d536b.png)
   
   This is the rocksDB cache metrics data captured from one DN.  It costs about 200ms for open a rocksdb to persist the block deletion trasaction command. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#discussion_r672849623



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java
##########
@@ -115,56 +201,71 @@ public void handle(SCMCommand command, OzoneContainer container,
 
       ContainerBlocksDeletionACKProto.Builder resultBuilder =
           ContainerBlocksDeletionACKProto.newBuilder();
-      containerBlocks.forEach(entry -> {
-        DeleteBlockTransactionResult.Builder txResultBuilder =
-            DeleteBlockTransactionResult.newBuilder();
-        txResultBuilder.setTxID(entry.getTxID());
-        long containerId = entry.getContainerID();
-        int newDeletionBlocks = 0;
-        try {
-          Container cont = containerSet.getContainer(containerId);
-          if (cont == null) {
-            throw new StorageContainerException("Unable to find the container "
-                + containerId, CONTAINER_NOT_FOUND);
-          }
-          ContainerProtos.ContainerType containerType = cont.getContainerType();
-          switch (containerType) {
-          case KeyValueContainer:
-            KeyValueContainerData containerData = (KeyValueContainerData)
-                cont.getContainerData();
-            cont.writeLock();
-            try {
-              if (containerData.getSchemaVersion().equals(SCHEMA_V1)) {
-                markBlocksForDeletionSchemaV1(containerData, entry);
-              } else if (containerData.getSchemaVersion().equals(SCHEMA_V2)) {
-                markBlocksForDeletionSchemaV2(containerData, entry,
-                    newDeletionBlocks, entry.getTxID());
-              } else {
-                throw new UnsupportedOperationException(
-                    "Only schema version 1 and schema version 2 are "
-                        + "supported.");
+      List<Future> futures = new ArrayList<>();
+      for (int i = 0; i < containerBlocks.size(); i++) {
+        DeletedBlocksTransaction tx = containerBlocks.get(i);
+        Future future = executor.submit(() -> {

Review comment:
       Sure.  I refactored the code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2420: HDDS-5360. DN failed to process all delete block commands in one heartbeat interval

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2420:
URL: https://github.com/apache/ozone/pull/2420#issuecomment-896778210


   > 
   > 
   > `TestBlockDeletion` timed out twice on `master` since this commit.
   > 
   >     * https://github.com/elek/ozone-build-results/blob/master/2021/08/10/9545/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion.txt
   > 
   >     * https://github.com/elek/ozone-build-results/blob/master/2021/08/10/9555/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion.txt
   > 
   > 
   > I think this should be fixed.
   
   Sure.  I will take a look of timeout issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org