You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "sumitagrawl (via GitHub)" <gi...@apache.org> on 2023/08/22 06:41:26 UTC

[GitHub] [ozone] sumitagrawl opened a new pull request, #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

sumitagrawl opened a new pull request, #5207:
URL: https://github.com/apache/ozone/pull/5207

   ## What changes were proposed in this pull request?
   
   - change max limit for DirectoryDeletingService to 6000 (from 10000), considering, 4KB size (considering acls, key/file name, and other meata)  * 6000, resulting 24MB
   - If overall request size crosses the size limit, and excluding the latest request do not have any information to delete, then consider limit as 1000 for the task for deletion and retry with same.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-8977
   
   ## How was this patch tested?
   
   - Testcase is added to verify the case of crossing limit size and retrying with 1000 limit for the task.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 commented on a diff in pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "sadanand48 (via GitHub)" <gi...@apache.org>.
sadanand48 commented on code in PR #5207:
URL: https://github.com/apache/ozone/pull/5207#discussion_r1303846061


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java:
##########
@@ -89,6 +91,7 @@ public class SnapshotDeletingService extends AbstractKeyDeletingService {
   // from the same table and can send deletion requests for same snapshot
   // multiple times.
   private static final int SNAPSHOT_DELETING_CORE_POOL_SIZE = 1;
+  private static final int MIN_ERR_LIMIT_PER_TASK = 1000;

Review Comment:
   Are we sure that 1000 entries would be well under the Ratis Byte limit or should we calculate this based on the remainingBufLimit?



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestDirectoryDeletingService.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.ozone.om.service;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.hdds.client.StandaloneReplicationConfig;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
+import org.apache.hadoop.hdds.server.ServerUtils;
+import org.apache.hadoop.hdds.utils.db.DBConfigFromFile;
+import org.apache.hadoop.ozone.om.KeyManager;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.ozone.om.OmTestManagers;
+import org.apache.hadoop.ozone.om.OzoneManager;
+import org.apache.hadoop.ozone.om.helpers.BucketLayout;
+import org.apache.hadoop.ozone.om.helpers.OmBucketInfo;
+import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyArgs;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol;
+import org.apache.hadoop.ozone.om.request.OMRequestTestUtils;
+import org.apache.hadoop.util.Time;
+import org.apache.ozone.test.GenericTestUtils;
+import org.apache.ratis.util.ExitUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_DIR_DELETING_SERVICE_INTERVAL;
+
+/**
+ * Test Directory Deleting Service.
+ */
+public class TestDirectoryDeletingService {
+  @Rule
+  public TemporaryFolder folder = new TemporaryFolder();
+  private OzoneManagerProtocol writeClient;
+  private OzoneManager om;
+  private String volumeName;
+  private String bucketName;
+
+  @BeforeClass
+  public static void setup() {
+    ExitUtils.disableSystemExit();
+  }
+
+  private OzoneConfiguration createConfAndInitValues() throws IOException {
+    OzoneConfiguration conf = new OzoneConfiguration();
+    File newFolder = folder.newFolder();
+    if (!newFolder.exists()) {
+      Assert.assertTrue(newFolder.mkdirs());
+    }
+    System.setProperty(DBConfigFromFile.CONFIG_DIR, "/");
+    ServerUtils.setOzoneMetaDirPath(conf, newFolder.toString());
+    conf.setTimeDuration(OZONE_DIR_DELETING_SERVICE_INTERVAL, 3000,
+        TimeUnit.MILLISECONDS);
+    conf.set(OMConfigKeys.OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT, "4MB");
+    conf.setQuietMode(false);
+
+    volumeName = String.format("volume01");
+    bucketName = String.format("bucket01");
+
+    return conf;
+  }
+
+  @After
+  public void cleanup() throws Exception {
+    om.stop();
+  }
+
+  @Test
+  public void testDeleteDirectoryCrossingSizeLimit() throws Exception {
+    OzoneConfiguration conf = createConfAndInitValues();
+    OmTestManagers omTestManagers
+        = new OmTestManagers(conf);
+    KeyManager keyManager = omTestManagers.getKeyManager();
+    writeClient = omTestManagers.getWriteClient();
+    om = omTestManagers.getOzoneManager();
+
+    OMRequestTestUtils.addVolumeAndBucketToDB(volumeName, bucketName,
+        om.getMetadataManager(), BucketLayout.FILE_SYSTEM_OPTIMIZED);
+
+    String bucketKey = om.getMetadataManager().getBucketKey(
+        volumeName, bucketName);
+    OmBucketInfo bucketInfo = om.getMetadataManager().getBucketTable()
+        .get(bucketKey);
+
+    // create parent directory and 2000 files in it crossing
+    // size of 4MB as defined for the testcase
+    String longName = "test";
+    for (int i = 0; i < 100; ++i) {
+      longName += "0123456789";
+    }
+    OmDirectoryInfo dir1 = new OmDirectoryInfo.Builder()
+        .setName("dir" + longName)
+        .setCreationTime(Time.now())
+        .setModificationTime(Time.now())
+        .setObjectID(1)
+        .setParentObjectID(bucketInfo.getObjectID())
+        .setUpdateID(0)
+        .build();
+    OMRequestTestUtils.addDirKeyToDirTable(true, dir1, volumeName, bucketName,
+        1L, om.getMetadataManager());
+
+    for (int i = 0; i < 2000; ++i) {
+      String keyName = "key" + longName + i;
+      OmKeyInfo omKeyInfo =
+          OMRequestTestUtils.createOmKeyInfo(volumeName, bucketName,
+              keyName, HddsProtos.ReplicationType.RATIS,
+              HddsProtos.ReplicationFactor.ONE, dir1.getObjectID() + 1 + i,
+              dir1.getObjectID(), 100, Time.now());
+      OMRequestTestUtils.addFileToKeyTable(false, true, keyName,
+          omKeyInfo, 1234L, i + 1, om.getMetadataManager());
+    }
+
+    // delete directory recursively
+    OmKeyArgs delArgs = new OmKeyArgs.Builder()
+        .setVolumeName(volumeName)
+        .setBucketName(bucketName)
+        .setKeyName("dir" + longName)
+        .setReplicationConfig(StandaloneReplicationConfig.getInstance(
+            HddsProtos.ReplicationFactor.ONE))
+        .setDataSize(0).setRecursive(true)
+        .build();
+    writeClient.deleteKey(delArgs);
+
+    // wait for file count of only 100 but not all 2000 files
+    // as logic, will take 100 files

Review Comment:
   1000 files?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 commented on a diff in pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "sadanand48 (via GitHub)" <gi...@apache.org>.
sadanand48 commented on code in PR #5207:
URL: https://github.com/apache/ozone/pull/5207#discussion_r1305332888


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java:
##########
@@ -89,6 +91,7 @@ public class SnapshotDeletingService extends AbstractKeyDeletingService {
   // from the same table and can send deletion requests for same snapshot
   // multiple times.
   private static final int SNAPSHOT_DELETING_CORE_POOL_SIZE = 1;
+  private static final int MIN_ERR_LIMIT_PER_TASK = 1000;

Review Comment:
   Thanks for explaining.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 commented on pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "sadanand48 (via GitHub)" <gi...@apache.org>.
sadanand48 commented on PR #5207:
URL: https://github.com/apache/ozone/pull/5207#issuecomment-1696907645

   Thanks @sumitagrawl for the patch, @errose28 for the review/comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sumitagrawl commented on a diff in pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.
sumitagrawl commented on code in PR #5207:
URL: https://github.com/apache/ozone/pull/5207#discussion_r1305226875


##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestDirectoryDeletingService.java:
##########
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.ozone.om.service;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.hdds.client.StandaloneReplicationConfig;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
+import org.apache.hadoop.hdds.server.ServerUtils;
+import org.apache.hadoop.hdds.utils.db.DBConfigFromFile;
+import org.apache.hadoop.ozone.om.KeyManager;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.ozone.om.OmTestManagers;
+import org.apache.hadoop.ozone.om.OzoneManager;
+import org.apache.hadoop.ozone.om.helpers.BucketLayout;
+import org.apache.hadoop.ozone.om.helpers.OmBucketInfo;
+import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyArgs;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol;
+import org.apache.hadoop.ozone.om.request.OMRequestTestUtils;
+import org.apache.hadoop.util.Time;
+import org.apache.ozone.test.GenericTestUtils;
+import org.apache.ratis.util.ExitUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_DIR_DELETING_SERVICE_INTERVAL;
+
+/**
+ * Test Directory Deleting Service.
+ */
+public class TestDirectoryDeletingService {
+  @Rule
+  public TemporaryFolder folder = new TemporaryFolder();
+  private OzoneManagerProtocol writeClient;
+  private OzoneManager om;
+  private String volumeName;
+  private String bucketName;
+
+  @BeforeClass
+  public static void setup() {
+    ExitUtils.disableSystemExit();
+  }
+
+  private OzoneConfiguration createConfAndInitValues() throws IOException {
+    OzoneConfiguration conf = new OzoneConfiguration();
+    File newFolder = folder.newFolder();
+    if (!newFolder.exists()) {
+      Assert.assertTrue(newFolder.mkdirs());
+    }
+    System.setProperty(DBConfigFromFile.CONFIG_DIR, "/");
+    ServerUtils.setOzoneMetaDirPath(conf, newFolder.toString());
+    conf.setTimeDuration(OZONE_DIR_DELETING_SERVICE_INTERVAL, 3000,
+        TimeUnit.MILLISECONDS);
+    conf.set(OMConfigKeys.OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT, "4MB");
+    conf.setQuietMode(false);
+
+    volumeName = String.format("volume01");
+    bucketName = String.format("bucket01");
+
+    return conf;
+  }
+
+  @After
+  public void cleanup() throws Exception {
+    om.stop();
+  }
+
+  @Test
+  public void testDeleteDirectoryCrossingSizeLimit() throws Exception {
+    OzoneConfiguration conf = createConfAndInitValues();
+    OmTestManagers omTestManagers
+        = new OmTestManagers(conf);
+    KeyManager keyManager = omTestManagers.getKeyManager();
+    writeClient = omTestManagers.getWriteClient();
+    om = omTestManagers.getOzoneManager();
+
+    OMRequestTestUtils.addVolumeAndBucketToDB(volumeName, bucketName,
+        om.getMetadataManager(), BucketLayout.FILE_SYSTEM_OPTIMIZED);
+
+    String bucketKey = om.getMetadataManager().getBucketKey(
+        volumeName, bucketName);
+    OmBucketInfo bucketInfo = om.getMetadataManager().getBucketTable()
+        .get(bucketKey);
+
+    // create parent directory and 2000 files in it crossing
+    // size of 4MB as defined for the testcase
+    String longName = "test";
+    for (int i = 0; i < 100; ++i) {
+      longName += "0123456789";
+    }
+    OmDirectoryInfo dir1 = new OmDirectoryInfo.Builder()
+        .setName("dir" + longName)
+        .setCreationTime(Time.now())
+        .setModificationTime(Time.now())
+        .setObjectID(1)
+        .setParentObjectID(bucketInfo.getObjectID())
+        .setUpdateID(0)
+        .build();
+    OMRequestTestUtils.addDirKeyToDirTable(true, dir1, volumeName, bucketName,
+        1L, om.getMetadataManager());
+
+    for (int i = 0; i < 2000; ++i) {
+      String keyName = "key" + longName + i;
+      OmKeyInfo omKeyInfo =
+          OMRequestTestUtils.createOmKeyInfo(volumeName, bucketName,
+              keyName, HddsProtos.ReplicationType.RATIS,
+              HddsProtos.ReplicationFactor.ONE, dir1.getObjectID() + 1 + i,
+              dir1.getObjectID(), 100, Time.now());
+      OMRequestTestUtils.addFileToKeyTable(false, true, keyName,
+          omKeyInfo, 1234L, i + 1, om.getMetadataManager());
+    }
+
+    // delete directory recursively
+    OmKeyArgs delArgs = new OmKeyArgs.Builder()
+        .setVolumeName(volumeName)
+        .setBucketName(bucketName)
+        .setKeyName("dir" + longName)
+        .setReplicationConfig(StandaloneReplicationConfig.getInstance(
+            HddsProtos.ReplicationFactor.ONE))
+        .setDataSize(0).setRecursive(true)
+        .build();
+    writeClient.deleteKey(delArgs);
+
+    // wait for file count of only 100 but not all 2000 files
+    // as logic, will take 100 files

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on PR #5207:
URL: https://github.com/apache/ozone/pull/5207#issuecomment-1693816939

   Thanks for working on this @sumitagrawl. I did not get a chance to thoroughly review the changes, but checking the size of the proto is a good idea. Looks like there should not be any performance issues with `getSerializedSize()` according to this old [mail thread](https://groups.google.com/g/protobuf/c/BwzA9jIigZU?pli=1) I dug up. Would it make sense to increase the Ratis buffer limit in the PR as well? After RATIS-1783 that change should be effective.
   
   cc @kerneltime @szetszwo who also looked at this issue in the past.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sumitagrawl commented on a diff in pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.
sumitagrawl commented on code in PR #5207:
URL: https://github.com/apache/ozone/pull/5207#discussion_r1305155976


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java:
##########
@@ -89,6 +91,7 @@ public class SnapshotDeletingService extends AbstractKeyDeletingService {
   // from the same table and can send deletion requests for same snapshot
   // multiple times.
   private static final int SNAPSHOT_DELETING_CORE_POOL_SIZE = 1;
+  private static final int MIN_ERR_LIMIT_PER_TASK = 1000;

Review Comment:
   32MB is limit set default, but even 6000* (4KB as approx size taken) => 24MB crosses, then we are going with 1000, i.e. 32MB/1000 => 32KB.
   We can not do calculation for size till its serialize, and will have impact like binary calculation till we achieve the meeting criteria. This kind of thing is not required, and even if 1000 entry, limit is not met, user need increase buffer size of Ratis.
   
   So this logic is not done, as its not simple to identify remaining buf directly without serializing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sumitagrawl commented on pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "sumitagrawl (via GitHub)" <gi...@apache.org>.
sumitagrawl commented on PR #5207:
URL: https://github.com/apache/ozone/pull/5207#issuecomment-1695911140

   > Would it make sense to increase the Ratis buffer limit in the PR as well? After RATIS-1783 that change should be effective.
   
   @errose28 
   IMO, increase ratis buffer size needs to be handled separately, as it involves,
   1. performance impact verification
   2. compatibility if rolling upgrade supported
   3. involves changes for OM, SCM, DN as common way of handling.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 merged pull request #5207: HDDS-8977. Ratis crash if a lot of directories deleted at once

Posted by "sadanand48 (via GitHub)" <gi...@apache.org>.
sadanand48 merged PR #5207:
URL: https://github.com/apache/ozone/pull/5207


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org