You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/09/17 00:57:47 UTC

[GitHub] [ozone] errose28 commented on a diff in pull request #3741: HDDS-6449. Failed container delete can leave artifacts on disk

errose28 commented on code in PR #3741:
URL: https://github.com/apache/ozone/pull/3741#discussion_r973483825


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -1135,6 +1136,20 @@ private void deleteInternal(Container container, boolean force)
               DELETE_ON_NON_EMPTY_CONTAINER);
         }
       }
+      if (container.getContainerData() instanceof KeyValueContainerData) {
+        KeyValueContainerData keyValueContainerData =
+            (KeyValueContainerData) container.getContainerData();
+        if (CleanUpManager
+            .checkContainerSchemaV3Enabled(keyValueContainerData)) {
+          HddsVolume hddsVolume = keyValueContainerData.getVolume();
+
+          // Initialize the directory
+          CleanUpManager cleanUpManager =
+              new CleanUpManager(hddsVolume);
+          // Rename
+          cleanUpManager.renameDir(keyValueContainerData);

Review Comment:
   In context this might be easier to understand if `renameDir` was named to describe the location of the rename or the purpose the rename serves. A name like `moveToTmpDirectory` might be better.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/CleanUpManager.java:
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.container.common.helpers;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.ozone.OzoneConsts;
+import org.apache.hadoop.ozone.container.common.utils.HddsVolumeUtil;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData;
+import org.apache.hadoop.ozone.container.upgrade.VersionedDatanodeFeatures;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.ListIterator;
+
+/**
+ * Helper class for handling /tmp/container_delete_service
+ * operations used for container delete when Schema V3 is enabled.
+ */
+public class CleanUpManager {
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(CleanUpManager.class);
+
+  private static final String FILE_SEPARATOR = File.separator;
+
+  private static final String TMP_DELETE_SERVICE_DIR =
+      FILE_SEPARATOR + "tmp" + FILE_SEPARATOR + "container_delete_service";
+
+  private final Path tmpDirPath;
+
+  public CleanUpManager(HddsVolume hddsVolume) {
+    this.tmpDirPath = setTmpDirPath(hddsVolume);
+
+    if (Files.notExists(tmpDirPath)) {
+      try {
+        Files.createDirectories(tmpDirPath);
+      } catch (IOException ex) {
+        LOG.error("Error creating {}", tmpDirPath.toString(), ex);
+      }
+    }
+  }
+
+  public Path getTmpDirPath() {
+    return tmpDirPath;
+  }
+
+  public static boolean checkContainerSchemaV3Enabled(
+      KeyValueContainerData keyValueContainerData) {
+    return (keyValueContainerData.getSchemaVersion()
+        .equals(OzoneConsts.SCHEMA_V3));
+  }
+
+  public static boolean checkContainerSchemaV3Enabled(
+      ConfigurationSource config) {
+    return VersionedDatanodeFeatures.SchemaV3
+        .isFinalizedAndEnabled(config);
+  }
+
+  private Path setTmpDirPath(HddsVolume hddsVolume) {
+    StringBuilder stringBuilder = new StringBuilder();
+
+    // HddsVolume root directory path
+    String hddsRoot = hddsVolume.getHddsRootDir().toString();
+
+    // HddsVolume path
+    String volPath = HddsVolumeUtil.getHddsRoot(hddsRoot);
+
+    stringBuilder.append(volPath);
+    stringBuilder.append(FILE_SEPARATOR);
+
+    String clusterId = "";
+    try {
+      clusterId += hddsVolume.getClusterID();
+
+      if (clusterId == null) {
+        throw new IOException("Volume has not been initialized," +
+            " clusterId is null.");
+      }
+    } catch (IOException ex) {
+      LOG.error("Exception in CleanUpManger setTmpDirPath.", ex);
+    }
+
+    String pathId = "";
+    try {
+      pathId += VersionedDatanodeFeatures.ScmHA
+          .chooseContainerPathID(hddsVolume, clusterId);
+    } catch (IOException ex) {
+      LOG.error("Failed to get the container path Id", ex);
+    }
+
+    stringBuilder.append(pathId);
+    stringBuilder.append(TMP_DELETE_SERVICE_DIR);
+
+    String tmpPath = stringBuilder.toString();
+    return Paths.get(tmpPath);
+  }
+
+  /**
+   * Renaming container directory path to a new location
+   * under "/tmp/container_delete_service" and
+   * updating metadata and chunks path.
+   * @param keyValueContainerData
+   * @return true if renaming was successful
+   */
+  public boolean renameDir(KeyValueContainerData keyValueContainerData) {
+    String containerPath = keyValueContainerData.getContainerPath();
+    File container = new File(containerPath);
+    String containerDirName = container.getName();
+
+    String destinationDirPath = tmpDirPath
+        .resolve(Paths.get(containerDirName)).toString();
+
+    boolean success = container.renameTo(new File(destinationDirPath));
+
+    if (success) {
+      keyValueContainerData.setMetadataPath(destinationDirPath +
+          FILE_SEPARATOR + OzoneConsts.CONTAINER_META_PATH);
+      keyValueContainerData.setChunksPath(destinationDirPath +
+          FILE_SEPARATOR + OzoneConsts.STORAGE_DIR_CHUNKS);

Review Comment:
   Just a note on this part: These only update the in memory values of the container's location. This is good enough to communicate the information to `KeyValueContainerUtil#removeContainer`. It will not survive a restart, but that is ok since the whole tmp delete directory will be wiped then. This would probably be good to explain in a comment here.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/CleanUpManager.java:
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.container.common.helpers;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.ozone.OzoneConsts;
+import org.apache.hadoop.ozone.container.common.utils.HddsVolumeUtil;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData;
+import org.apache.hadoop.ozone.container.upgrade.VersionedDatanodeFeatures;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.ListIterator;
+
+/**
+ * Helper class for handling /tmp/container_delete_service

Review Comment:
   Let's expand the javadoc to specify that this tmp directory exists on a per volume basis. /tmp looks like it refers to the shared OS /tmp directory instead of the one on this volume. When tmp is being referred to in other places for comments, we should probably omit the leading slash to indicate the comment is not referring to an absolute path.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/statemachine/background/StaleRecoveringContainerScrubbingService.java:
##########
@@ -84,6 +88,19 @@ static class RecoveringContainerScrubbingTask implements BackgroundTask {
 
     @Override
     public BackgroundTaskResult call() throws Exception {
+      Container container = containerSet.getContainer(containerID);

Review Comment:
   This class is used for EC offline reconstruction. Let's leave that code as is for now. It uses its own tmp directory to build the new container, and may be able to use that for deleting failed reconstructions as well.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java:
##########
@@ -556,6 +569,9 @@ public void terminateDatanode() {
 
   @Override
   public void stop() {
+    if (CleanUpManager.checkContainerSchemaV3Enabled(conf)) {
+      cleanTmpDir();
+    }

Review Comment:
   The shutdown hook should be moved inside the atomic boolean check to make sure it is only invoked once.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/CleanUpManager.java:
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.container.common.helpers;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.ozone.OzoneConsts;
+import org.apache.hadoop.ozone.container.common.utils.HddsVolumeUtil;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData;
+import org.apache.hadoop.ozone.container.upgrade.VersionedDatanodeFeatures;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.ListIterator;
+
+/**
+ * Helper class for handling /tmp/container_delete_service
+ * operations used for container delete when Schema V3 is enabled.
+ */
+public class CleanUpManager {
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(CleanUpManager.class);
+
+  private static final String FILE_SEPARATOR = File.separator;
+
+  private static final String TMP_DELETE_SERVICE_DIR =
+      FILE_SEPARATOR + "tmp" + FILE_SEPARATOR + "container_delete_service";
+
+  private final Path tmpDirPath;
+
+  public CleanUpManager(HddsVolume hddsVolume) {
+    this.tmpDirPath = setTmpDirPath(hddsVolume);
+
+    if (Files.notExists(tmpDirPath)) {
+      try {
+        Files.createDirectories(tmpDirPath);
+      } catch (IOException ex) {
+        LOG.error("Error creating {}", tmpDirPath.toString(), ex);
+      }
+    }
+  }
+
+  public Path getTmpDirPath() {
+    return tmpDirPath;
+  }
+
+  public static boolean checkContainerSchemaV3Enabled(
+      KeyValueContainerData keyValueContainerData) {
+    return (keyValueContainerData.getSchemaVersion()
+        .equals(OzoneConsts.SCHEMA_V3));
+  }
+
+  public static boolean checkContainerSchemaV3Enabled(
+      ConfigurationSource config) {
+    return VersionedDatanodeFeatures.SchemaV3
+        .isFinalizedAndEnabled(config);
+  }
+
+  private Path setTmpDirPath(HddsVolume hddsVolume) {
+    StringBuilder stringBuilder = new StringBuilder();
+
+    // HddsVolume root directory path
+    String hddsRoot = hddsVolume.getHddsRootDir().toString();
+
+    // HddsVolume path
+    String volPath = HddsVolumeUtil.getHddsRoot(hddsRoot);
+
+    stringBuilder.append(volPath);
+    stringBuilder.append(FILE_SEPARATOR);
+
+    String clusterId = "";
+    try {
+      clusterId += hddsVolume.getClusterID();
+
+      if (clusterId == null) {
+        throw new IOException("Volume has not been initialized," +
+            " clusterId is null.");
+      }
+    } catch (IOException ex) {
+      LOG.error("Exception in CleanUpManger setTmpDirPath.", ex);
+    }
+
+    String pathId = "";
+    try {
+      pathId += VersionedDatanodeFeatures.ScmHA
+          .chooseContainerPathID(hddsVolume, clusterId);
+    } catch (IOException ex) {
+      LOG.error("Failed to get the container path Id", ex);
+    }
+
+    stringBuilder.append(pathId);
+    stringBuilder.append(TMP_DELETE_SERVICE_DIR);
+
+    String tmpPath = stringBuilder.toString();
+    return Paths.get(tmpPath);
+  }
+
+  /**
+   * Renaming container directory path to a new location
+   * under "/tmp/container_delete_service" and
+   * updating metadata and chunks path.
+   * @param keyValueContainerData
+   * @return true if renaming was successful
+   */
+  public boolean renameDir(KeyValueContainerData keyValueContainerData) {
+    String containerPath = keyValueContainerData.getContainerPath();
+    File container = new File(containerPath);
+    String containerDirName = container.getName();
+
+    String destinationDirPath = tmpDirPath
+        .resolve(Paths.get(containerDirName)).toString();
+
+    boolean success = container.renameTo(new File(destinationDirPath));
+
+    if (success) {
+      keyValueContainerData.setMetadataPath(destinationDirPath +
+          FILE_SEPARATOR + OzoneConsts.CONTAINER_META_PATH);
+      keyValueContainerData.setChunksPath(destinationDirPath +
+          FILE_SEPARATOR + OzoneConsts.STORAGE_DIR_CHUNKS);
+    }
+    return success;
+  }
+
+  /**
+   * Get direct files under /tmp/container_delete_service
+   * and store them in a list.
+   * @return iterator to the list of the leftover files
+   */
+  public ListIterator<File> getDeleteLeftovers() {
+    List<File> leftovers = new ArrayList<>();
+
+    try {
+      File tmpDir = new File(tmpDirPath.toString());
+
+      for (File file : tmpDir.listFiles()) {
+        leftovers.add(file);
+      }
+    } catch (NullPointerException ex) {
+      LOG.error("Tmp directory is null, path doesn't exist", ex);
+    }
+
+    ListIterator<File> leftoversListIt = leftovers.listIterator();
+
+    return leftoversListIt;
+  }
+
+  /**
+   * @return true if tmpDir is empty
+   */
+  public boolean tmpDirIsEmpty() {

Review Comment:
   I don't think this method is necessary. `cleanTmpDir` will be a no-op if the directory is empty so we can just call it directly.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -296,8 +297,11 @@ public void delete() throws StorageContainerException {
     } catch (StorageContainerException ex) {
       throw ex;
     } catch (IOException ex) {
-      // TODO : An I/O error during delete can leave partial artifacts on the
-      // disk. We will need the cleaner thread to cleanup this information.
+      if (CleanUpManager.checkContainerSchemaV3Enabled(containerData)) {
+        HddsVolume hddsVolume = containerData.getVolume();
+        CleanUpManager cleanUpManager = new CleanUpManager(hddsVolume);
+        cleanUpManager.cleanTmpDir();
+      }

Review Comment:
   I think we can remove this part. The delete path is:
   1. Container is moved to tmp dir
   2. Container is deleted from tmp dir
   
   If 1 failed, the container is not in the tmp dir to delete. If 2 failed, I don't think retrying the same thing a second time here is going to make progress. We can wait until shutdown/restart either wipes the volume's tmp dir or results in the volume being removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org