You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "sumitagrawl (via GitHub)" <gi...@apache.org> on 2023/06/25 05:13:01 UTC

[GitHub] [ozone] sumitagrawl opened a new pull request, #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

sumitagrawl opened a new pull request, #4972:
URL: https://github.com/apache/ozone/pull/4972

   ## What changes were proposed in this pull request?
   
   - state DELETED is added in container file before container delete
   
   current steps:
   ```
   1. Move container directory to tmp deleted containers directory on the same file system (atomic).
   2. Delete DB entries
   3. Delete container from tmp directory.
   ```
   new steps:
   ```
   1. Mark container state as DELETED in container yaml file
   2. Move container directory to tmp deleted containers directory on the same file system (atomic).
   3. Delete DB entries
   4. Delete container from tmp directory.
   
   So during startup,
   If any container in DELETED state, move to temp Directory
   container present in active directory, and state is not deleted, no need delete from DB, else delete as part of cleanup
   ```
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-8770
   
   ## How was this patch tested?
   
   - TODO UT after [HDDS-8771. Refactor volume level tmp directory for generic usage.](https://github.com/apache/ozone/pull/4838/files#top)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guohao-rosicky commented on pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "guohao-rosicky (via GitHub)" <gi...@apache.org>.
guohao-rosicky commented on PR #4972:
URL: https://github.com/apache/ozone/pull/4972#issuecomment-1614831889

   @sumitagrawl  thanks work on this.
   
   If the datanode process dies while executing the deletion of a container, will the subsequent startup of the datanode continue to delete the container?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sumitagrawl commented on a diff in pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.
sumitagrawl commented on code in PR #4972:
URL: https://github.com/apache/ozone/pull/4972#discussion_r1260655821


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java:
##########
@@ -149,6 +149,31 @@ public static void removeContainer(KeyValueContainerData containerData,
         .getMetadataPath());
     File chunksPath = new File(containerData.getChunksPath());
 
+    if (!containerData.hasSchema(OzoneConsts.SCHEMA_V3)) {

Review Comment:
   Removed this as not required, cache cleanup is already done at caller. other places during startup, this call not required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a diff in pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #4972:
URL: https://github.com/apache/ozone/pull/4972#discussion_r1248191412


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -1336,6 +1336,7 @@ private void deleteInternal(Container container, boolean force)
 
         // Rename container location
         try {
+          container.markContainerForDelete();

Review Comment:
   After the addition of this call, line 1380 at the end of this method can be removed. The `ContainerData` object should still be intact throughout this method even after it is removed from the `ContainerSet`.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -365,6 +365,22 @@ public void markContainerUnhealthy() throws StorageContainerException {
             prevState);
   }
 
+  @Override
+  public void markContainerForDelete() throws StorageContainerException {
+    writeLock();
+    ContainerDataProto.State prevState = containerData.getState();
+    try {
+      updateContainerData(() ->
+          containerData.setState(ContainerDataProto.State.DELETED));
+      clearPendingPutBlockCache();

Review Comment:
   I don't think this line is necessary since the container should already be closed and not accepting put blocks when this is called. I don't think it causes any harm having it here though.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/endpoint/VersionEndpointTask.java:
##########
@@ -135,4 +145,27 @@ private void checkVolumeSet(MutableVolumeSet volumeSet,
       volumeSet.writeUnlock();
     }
   }
+
+  private void cleanupDeletedContainer(HddsVolume volume) {

Review Comment:
   Lets move the startup cleanup steps somewhere inside `ContainerReader` so that they can be done as part of the initial iteration that builds up the container set with one thread per volume, instead of doing a second iteration of all containers in one thread afterwards. Probably inside `ContainerRead#verifyAndFixupContainerData` or `KeyValueContainerUtil#parseKVContainerData`.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -365,6 +365,22 @@ public void markContainerUnhealthy() throws StorageContainerException {
             prevState);
   }
 
+  @Override
+  public void markContainerForDelete() throws StorageContainerException {
+    writeLock();
+    ContainerDataProto.State prevState = containerData.getState();
+    try {
+      updateContainerData(() ->
+          containerData.setState(ContainerDataProto.State.DELETED));
+      clearPendingPutBlockCache();
+    } finally {
+      writeUnlock();
+    }
+    LOG.warn("Moving container {} to state {} from state:{}",
+        containerData.getContainerPath(), containerData.getState(),
+        prevState);

Review Comment:
   This should probably be an info log.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -365,6 +365,22 @@ public void markContainerUnhealthy() throws StorageContainerException {
             prevState);
   }
 
+  @Override
+  public void markContainerForDelete() throws StorageContainerException {
+    writeLock();
+    ContainerDataProto.State prevState = containerData.getState();
+    try {
+      updateContainerData(() ->
+          containerData.setState(ContainerDataProto.State.DELETED));

Review Comment:
   I think we should change `updateContainerData` to not roll back the container state if the update of the container file fails, similar to what it does for the unhealthy state. If the file update fails we can continue with the rest of the delete process, otherwise if we have a container with a corrupt/missing container file we will never be able to delete it.
   
   This could leave orphan entries in Volume RocksDB only if:
   1. Container file update fails
   2. RocksDB update also fails or datanode crashes between these two steps.
   These orphan entries should not be harmful and this is a rare case so I think this is ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sumitagrawl commented on pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.
sumitagrawl commented on PR #4972:
URL: https://github.com/apache/ozone/pull/4972#issuecomment-1621115401

   @errose28 All changes are done, please review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 merged pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 merged PR #4972:
URL: https://github.com/apache/ozone/pull/4972


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sumitagrawl commented on a diff in pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.
sumitagrawl commented on code in PR #4972:
URL: https://github.com/apache/ozone/pull/4972#discussion_r1260656619


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java:
##########
@@ -232,4 +240,20 @@ public void verifyAndFixupContainerData(ContainerData containerData)
           ContainerProtos.Result.UNKNOWN_CONTAINER_TYPE);
     }
   }
+
+  private void cleanupDeletedContainer(

Review Comment:
   normal delete, db cleanup and move is done within lock. But deleted directory is done outside lock. So these 2 steps can not be combined here as no lock required here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a diff in pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #4972:
URL: https://github.com/apache/ozone/pull/4972#discussion_r1262007986


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java:
##########
@@ -232,4 +239,21 @@ public void verifyAndFixupContainerData(ContainerData containerData)
           ContainerProtos.Result.UNKNOWN_CONTAINER_TYPE);
     }
   }
+
+  private void cleanupContainer(
+      HddsVolume volume, KeyValueContainer kvContainer) {
+    try {
+      LOG.info("Cleanup container {}.",
+          kvContainer.getContainerData().getContainerID());

Review Comment:
   nit.
   ```suggestion
         LOG.info("Finishing delete of container {}.",
             kvContainer.getContainerData().getContainerID());
   ```



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java:
##########
@@ -232,4 +239,21 @@ public void verifyAndFixupContainerData(ContainerData containerData)
           ContainerProtos.Result.UNKNOWN_CONTAINER_TYPE);
     }
   }
+
+  private void cleanupContainer(
+      HddsVolume volume, KeyValueContainer kvContainer) {
+    try {
+      LOG.info("Cleanup container {}.",
+          kvContainer.getContainerData().getContainerID());
+      // container information from db is removed for V3
+      // and container moved to tmp folder
+      // then container content removed from tmp folder
+      KeyValueContainerUtil.removeContainer(kvContainer.getContainerData(),
+          volume.getConf());
+      kvContainer.delete();
+    } catch (IOException ex) {
+      LOG.warn("Failed to remove container {}.",
+          kvContainer.getContainerData().getContainerID(), ex);

Review Comment:
   nit.
   ```suggestion
         LOG.warn("Failed to remove deleted container {}.",
             kvContainer.getContainerData().getContainerID(), ex);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a diff in pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #4972:
URL: https://github.com/apache/ozone/pull/4972#discussion_r1260546981


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java:
##########
@@ -160,6 +160,11 @@ void updateDataScanTimestamp(Instant timestamp)
    */
   void markContainerUnhealthy() throws StorageContainerException;
 
+  /**
+   * Marks the container replica as deleted.
+   */
+  void markContainerForDelete() throws StorageContainerException;

Review Comment:
   nit. The exception can be removed from the method signature here and in the implementation method.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java:
##########
@@ -232,4 +240,20 @@ public void verifyAndFixupContainerData(ContainerData containerData)
           ContainerProtos.Result.UNKNOWN_CONTAINER_TYPE);
     }
   }
+
+  private void cleanupDeletedContainer(

Review Comment:
   Can we use a method like this in both the startup cleanup code and the main container delete path? That will keep the same sequence of steps in one place.



##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/ozone/container/common/TestEndPoint.java:
##########
@@ -182,70 +179,30 @@ public void testDeletedContainersClearedOnStartup() throws Exception {
     conf.setFromObject(new ReplicationConfig().setPort(0));
     try (EndpointStateMachine rpcEndPoint = createEndpoint(conf,
         serverAddress, 1000)) {
-      DatanodeDetails datanodeDetails = randomDatanodeDetails();
-      OzoneContainer ozoneContainer = new OzoneContainer(
-          datanodeDetails, conf, getContext(datanodeDetails));
+      OzoneContainer ozoneContainer = createVolume(conf);
+      HddsVolume hddsVolume = (HddsVolume) ozoneContainer.getVolumeSet()
+          .getVolumesList().get(0);
+      KeyValueContainer kvContainer = addContainer(conf, hddsVolume);
+      // For testing, we are moving the container under the tmp directory,
+      // in order to delete during datanode startup or shutdown
+      KeyValueContainerUtil.moveToDeletedContainerDir(
+          kvContainer.getContainerData(), hddsVolume);

Review Comment:
   nit. Might be good to check that the deleted containers directory has content here so we know this call succeeded and the startup resulted in the directory being cleared.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java:
##########
@@ -250,52 +241,6 @@ public void cleanDeletedContainerDir() {
       // so it won't be loaded in the container set
       // but there will be orphaned entries in the volume's RocksDB.

Review Comment:
   We can update this comment since we are not deleting RocksDB entries in this method anymore.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java:
##########
@@ -149,6 +149,31 @@ public static void removeContainer(KeyValueContainerData containerData,
         .getMetadataPath());
     File chunksPath = new File(containerData.getChunksPath());
 
+    if (!containerData.hasSchema(OzoneConsts.SCHEMA_V3)) {
+      // Close the DB connection and remove the DB handler from cache
+      BlockUtils.removeDB(containerData, conf);
+    }
+
+    // Delete the Container MetaData path.
+    FileUtils.deleteDirectory(containerMetaDataPath);
+
+    //Delete the Container Chunks Path.
+    FileUtils.deleteDirectory(chunksPath);
+
+    //Delete Container directory
+    FileUtils.deleteDirectory(containerMetaDataPath.getParentFile());

Review Comment:
   I think we should update `KeyValueContainerUtil#moveToDeletedContainerDir` to not bother updating the paths of the in-memory container data. Then instead of deleting the directories individually we can simplify this method to just remove the DB cache handles and delete the container from the tmp dir with one call to `FileUtils#deleteDirectory`. This will reduce the size of this method and it could be absorbed into `KeyValueContainer#delete` (its only caller) since we already have a lot of delete related methods floating around different classes.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -1350,6 +1350,9 @@ private void deleteInternal(Container container, boolean force)
 
         // Rename container location
         try {
+          container.markContainerForDelete();

Review Comment:
   We should remove the container from the containerSet after this line, and move both those lines just before the try block. Currently the catch block may throw an exception if DB update or container move fails, which would leave a DELETED container in the container set.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java:
##########
@@ -149,6 +149,31 @@ public static void removeContainer(KeyValueContainerData containerData,
         .getMetadataPath());
     File chunksPath = new File(containerData.getChunksPath());
 
+    if (!containerData.hasSchema(OzoneConsts.SCHEMA_V3)) {

Review Comment:
   Not related to this line specifically, but we should update the comment block on `KeyValueContainerUtil#moveToDeletedDir` to reflect the current delete steps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on PR #4972:
URL: https://github.com/apache/ozone/pull/4972#issuecomment-1615137197

   > @sumitagrawl thanks work on this.
   > 
   > If a data node process dies during the execution of deleting a container, will the subsequent startup of the data node continue to delete the container and is this operation asynchronous? Will it affect the startup speed?
   
   Hi @guohao-rosicky thanks for reviewing the change. The first step of the delete process after this change is to update the container state to DELETED in the container file, which allows the datanode to finish the delete on restart if it failed partway through earlier. The way the change was written at the time of your comment may have affected startup speed slightly since it was doing a single threaded iteration of all containers in memory. However, with [this suggested change](https://github.com/apache/ozone/pull/4972#discussion_r1248229798) there should be no startup impact, because the deletion would be done as part of the existing container set construction on startup. This existing process uses one thread per volume to build up the datanode's in memory set of containers based on those that are present on the disks, and is a pre-requisite to the datanode startup completing. Delete failure should be uncommon so handling them when they are seen on restart in this area should no
 t affect startup time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sumitagrawl commented on pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.
sumitagrawl commented on PR #4972:
URL: https://github.com/apache/ozone/pull/4972#issuecomment-1631919915

   @errose28 Updated the comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org