You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/07/13 09:26:39 UTC

[GitHub] [ozone] JacksonYao287 opened a new pull request, #3591: HDDS-6987.EC: Implement RECOVERING Container Scrubber

JacksonYao287 opened a new pull request, #3591:
URL: https://github.com/apache/ozone/pull/3591

   ## What changes were proposed in this pull request?
   
   Implement stale RECOVERING Container Scrubbing service
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6987
   
   ## How was this patch tested?
   
   unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3591:
URL: https://github.com/apache/ozone/pull/3591#discussion_r920381767


##########
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestStaleRecoveringContainerScrubbingService.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.ozone.container.common;
+
+import org.apache.hadoop.conf.StorageUnit;
+import org.apache.hadoop.hdds.HddsConfigKeys;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException;
+import org.apache.hadoop.ozone.container.common.impl.ContainerLayoutVersion;
+import org.apache.hadoop.ozone.container.common.impl.ContainerSet;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet;
+import org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy;
+import org.apache.hadoop.ozone.container.keyvalue.ContainerTestVersionInfo;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData;
+import org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils;
+import org.apache.hadoop.ozone.container.keyvalue.statemachine.background.StaleRecoveringContainerScrubbingService;
+import org.apache.ozone.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Iterator;
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+import static org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State.CLOSED;
+import static org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State.RECOVERING;
+import static org.mockito.ArgumentMatchers.anyList;
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.Mockito.mock;
+
+/**
+ * Tests to stale recovering container scrubbing service.
+ */
+@RunWith(Parameterized.class)
+public class TestStaleRecoveringContainerScrubbingService {
+
+  @Rule
+  public final TemporaryFolder tempDir = new TemporaryFolder();
+  private String datanodeUuid;
+  private OzoneConfiguration conf;
+  private HddsVolume hddsVolume;
+
+  private final ContainerLayoutVersion layout;
+  private final String schemaVersion;
+  private String clusterID;
+  private int containerIdNum = 0;
+  private MutableVolumeSet volumeSet;
+  private RoundRobinVolumeChoosingPolicy volumeChoosingPolicy;
+
+  public TestStaleRecoveringContainerScrubbingService(
+      ContainerTestVersionInfo versionInfo) {
+    this.layout = versionInfo.getLayout();
+    this.schemaVersion = versionInfo.getSchemaVersion();
+    conf = new OzoneConfiguration();
+    ContainerTestVersionInfo.setTestSchemaVersion(schemaVersion, conf);
+  }
+
+  @Parameterized.Parameters
+  public static Iterable<Object[]> parameters() {
+    return ContainerTestVersionInfo.versionParameters();
+  }
+
+  @Before
+  public void init() throws IOException {
+    File volumeDir = tempDir.newFolder();
+    conf.set(ScmConfigKeys.HDDS_DATANODE_DIR_KEY, volumeDir.getAbsolutePath());
+    conf.set(HddsConfigKeys.OZONE_METADATA_DIRS, volumeDir.getAbsolutePath());
+    datanodeUuid = UUID.randomUUID().toString();
+    hddsVolume = new HddsVolume.Builder(volumeDir.getAbsolutePath())
+        .conf(conf).datanodeUuid(datanodeUuid).clusterID(clusterID).build();
+    volumeSet = mock(MutableVolumeSet.class);
+    clusterID = UUID.randomUUID().toString();
+    volumeChoosingPolicy = mock(RoundRobinVolumeChoosingPolicy.class);
+    Mockito.when(volumeChoosingPolicy.chooseVolume(anyList(), anyLong()))
+        .thenReturn(hddsVolume);
+  }
+
+  @After
+  public void cleanup() throws IOException {
+    BlockUtils.shutdownCache(conf);
+  }
+
+  /**
+   * A helper method to create a number of containers of given state.
+   */
+  private void createTestContainers(
+      ContainerSet containerSet, int num,
+      ContainerProtos.ContainerDataProto.State state)
+      throws StorageContainerException {
+    int end = containerIdNum + num;
+    for (; containerIdNum < end; containerIdNum++) {
+      KeyValueContainerData recoveringContainerData = new KeyValueContainerData(
+          containerIdNum, layout, (long) StorageUnit.GB.toBytes(5),
+          UUID.randomUUID().toString(), datanodeUuid);
+      //create a container with recovering state
+      recoveringContainerData.setState(state);
+
+      KeyValueContainer recoveringKeyValueContainer =
+          new KeyValueContainer(recoveringContainerData,
+              conf);
+      recoveringKeyValueContainer.create(
+          volumeSet, volumeChoosingPolicy, clusterID);
+      containerSet.addContainer(recoveringKeyValueContainer);
+    }
+  }
+
+  @Test
+  public void testScrubbingStaleRecoveringContainers()
+      throws StorageContainerException, InterruptedException, TimeoutException {
+    ContainerSet containerSet = new ContainerSet();
+    StaleRecoveringContainerScrubbingService srcss =
+        new StaleRecoveringContainerScrubbingService(
+            Duration.ofMillis(50).toMillis(),
+            TimeUnit.MILLISECONDS, 10,
+            Duration.ofSeconds(300).toMillis(),
+            containerSet, conf);
+    srcss.setRecoveringTimeout(10);
+
+    srcss.start();
+    GenericTestUtils.waitFor(() -> srcss.getThreadCount() > 0, 100, 3000);
+    createTestContainers(containerSet, 5, CLOSED);
+    Thread.sleep(100L);
+    //closed container should not be scrubbed
+    Assert.assertTrue(containerSet.containerCount() == 5);
+
+    createTestContainers(containerSet, 5, RECOVERING);
+    Thread.sleep(100L);
+    //recovering container should be scrubbed since recovering timeout
+    Assert.assertTrue(containerSet.containerCount() == 5);
+    Iterator<Container<?>> it = containerSet.getContainerIterator();
+    while (it.hasNext()) {
+      Container<?> entry = it.next();
+      Assert.assertTrue(entry.getContainerState().equals(CLOSED));
+    }
+
+    //increase recovering timeout
+    srcss.setRecoveringTimeout(100000);
+    createTestContainers(containerSet, 5, RECOVERING);

Review Comment:
   To avoid this kind of sleeps, we followed an interesting approach of passing the clock ( Credit: Originally suggested by @sodonnel ). So, I would suggest to pass the clock and we can fast forward the clock.
   Please refer: https://github.com/apache/ozone/pull/3008/files



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3591:
URL: https://github.com/apache/ozone/pull/3591#discussion_r921804352


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java:
##########
@@ -187,6 +187,23 @@ public final class OzoneConfigKeys {
   public static final int OZONE_BLOCK_DELETING_SERVICE_WORKERS_DEFAULT
       = 10;
 
+  /**
+   * Configuration properties for Ozone Recovering Container Scrubbing Service.
+   */

Review Comment:
   ```
   diff --git a/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
   index 5fd2eb0ea..687b28f17 100644
   --- a/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
   +++ b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
   @@ -104,6 +104,8 @@ private void addPropertiesNotInXml() {
            OzoneConfigKeys.OZONE_ACL_AUTHORIZER_CLASS_NATIVE,
            OzoneConfigKeys.OZONE_S3_AUTHINFO_MAX_LIFETIME_KEY,
            OzoneConfigKeys.OZONE_CLIENT_REQUIRED_OM_VERSION_MIN_KEY,
   +        OzoneConfigKeys.OZONE_RECOVERING_CONTAINER_SCRUBBING_SERVICE_TIMEOUT,
   +        OzoneConfigKeys.OZONE_RECOVERING_CONTAINER_SCRUBBING_SERVICE_WORKERS,
            ReconConfigKeys.RECON_SCM_CONFIG_PREFIX,
            ReconConfigKeys.OZONE_RECON_ADDRESS_KEY,
            ReconConfigKeys.OZONE_RECON_DATANODE_ADDRESS_KEY,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3591:
URL: https://github.com/apache/ozone/pull/3591#discussion_r920382252


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java:
##########
@@ -187,6 +187,23 @@ public final class OzoneConfigKeys {
   public static final int OZONE_BLOCK_DELETING_SERVICE_WORKERS_DEFAULT
       = 10;
 
+  /**
+   * Configuration properties for Ozone Recovering Container Scrubbing Service.
+   */

Review Comment:
   I think tests are impacted because we did not add this configs into xml files.



##########
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestStaleRecoveringContainerScrubbingService.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.ozone.container.common;
+
+import org.apache.hadoop.conf.StorageUnit;
+import org.apache.hadoop.hdds.HddsConfigKeys;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException;
+import org.apache.hadoop.ozone.container.common.impl.ContainerLayoutVersion;
+import org.apache.hadoop.ozone.container.common.impl.ContainerSet;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet;
+import org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy;
+import org.apache.hadoop.ozone.container.keyvalue.ContainerTestVersionInfo;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData;
+import org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils;
+import org.apache.hadoop.ozone.container.keyvalue.statemachine.background.StaleRecoveringContainerScrubbingService;
+import org.apache.ozone.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Iterator;
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+import static org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State.CLOSED;
+import static org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State.RECOVERING;
+import static org.mockito.ArgumentMatchers.anyList;
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.Mockito.mock;
+
+/**
+ * Tests to stale recovering container scrubbing service.
+ */
+@RunWith(Parameterized.class)
+public class TestStaleRecoveringContainerScrubbingService {
+
+  @Rule
+  public final TemporaryFolder tempDir = new TemporaryFolder();
+  private String datanodeUuid;
+  private OzoneConfiguration conf;
+  private HddsVolume hddsVolume;
+
+  private final ContainerLayoutVersion layout;
+  private final String schemaVersion;
+  private String clusterID;
+  private int containerIdNum = 0;
+  private MutableVolumeSet volumeSet;
+  private RoundRobinVolumeChoosingPolicy volumeChoosingPolicy;
+
+  public TestStaleRecoveringContainerScrubbingService(
+      ContainerTestVersionInfo versionInfo) {
+    this.layout = versionInfo.getLayout();
+    this.schemaVersion = versionInfo.getSchemaVersion();
+    conf = new OzoneConfiguration();
+    ContainerTestVersionInfo.setTestSchemaVersion(schemaVersion, conf);
+  }
+
+  @Parameterized.Parameters
+  public static Iterable<Object[]> parameters() {
+    return ContainerTestVersionInfo.versionParameters();
+  }
+
+  @Before
+  public void init() throws IOException {
+    File volumeDir = tempDir.newFolder();
+    conf.set(ScmConfigKeys.HDDS_DATANODE_DIR_KEY, volumeDir.getAbsolutePath());
+    conf.set(HddsConfigKeys.OZONE_METADATA_DIRS, volumeDir.getAbsolutePath());
+    datanodeUuid = UUID.randomUUID().toString();
+    hddsVolume = new HddsVolume.Builder(volumeDir.getAbsolutePath())
+        .conf(conf).datanodeUuid(datanodeUuid).clusterID(clusterID).build();
+    volumeSet = mock(MutableVolumeSet.class);
+    clusterID = UUID.randomUUID().toString();
+    volumeChoosingPolicy = mock(RoundRobinVolumeChoosingPolicy.class);
+    Mockito.when(volumeChoosingPolicy.chooseVolume(anyList(), anyLong()))
+        .thenReturn(hddsVolume);
+  }
+
+  @After
+  public void cleanup() throws IOException {
+    BlockUtils.shutdownCache(conf);
+  }
+
+  /**
+   * A helper method to create a number of containers of given state.
+   */
+  private void createTestContainers(
+      ContainerSet containerSet, int num,
+      ContainerProtos.ContainerDataProto.State state)
+      throws StorageContainerException {
+    int end = containerIdNum + num;
+    for (; containerIdNum < end; containerIdNum++) {
+      KeyValueContainerData recoveringContainerData = new KeyValueContainerData(
+          containerIdNum, layout, (long) StorageUnit.GB.toBytes(5),
+          UUID.randomUUID().toString(), datanodeUuid);
+      //create a container with recovering state
+      recoveringContainerData.setState(state);
+
+      KeyValueContainer recoveringKeyValueContainer =
+          new KeyValueContainer(recoveringContainerData,
+              conf);
+      recoveringKeyValueContainer.create(
+          volumeSet, volumeChoosingPolicy, clusterID);
+      containerSet.addContainer(recoveringKeyValueContainer);
+    }
+  }
+
+  @Test
+  public void testScrubbingStaleRecoveringContainers()
+      throws StorageContainerException, InterruptedException, TimeoutException {
+    ContainerSet containerSet = new ContainerSet();
+    StaleRecoveringContainerScrubbingService srcss =
+        new StaleRecoveringContainerScrubbingService(
+            Duration.ofMillis(50).toMillis(),
+            TimeUnit.MILLISECONDS, 10,
+            Duration.ofSeconds(300).toMillis(),
+            containerSet, conf);
+    srcss.setRecoveringTimeout(10);
+
+    srcss.start();
+    GenericTestUtils.waitFor(() -> srcss.getThreadCount() > 0, 100, 3000);
+    createTestContainers(containerSet, 5, CLOSED);
+    Thread.sleep(100L);
+    //closed container should not be scrubbed
+    Assert.assertTrue(containerSet.containerCount() == 5);
+
+    createTestContainers(containerSet, 5, RECOVERING);
+    Thread.sleep(100L);
+    //recovering container should be scrubbed since recovering timeout
+    Assert.assertTrue(containerSet.containerCount() == 5);
+    Iterator<Container<?>> it = containerSet.getContainerIterator();
+    while (it.hasNext()) {
+      Container<?> entry = it.next();
+      Assert.assertTrue(entry.getContainerState().equals(CLOSED));
+    }
+
+    //increase recovering timeout
+    srcss.setRecoveringTimeout(100000);
+    createTestContainers(containerSet, 5, RECOVERING);

Review Comment:
   ex:
       testClock.fastForward(300 * 1000 + 1);



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##########
@@ -72,6 +75,9 @@ public boolean addContainer(Container<?> container) throws
       }
       // wish we could have done this from ContainerData.setState
       container.getContainerData().commitSpace();
+      if (container.getContainerData().getState() == RECOVERING) {

Review Comment:
   Also we can set the time currentTime + <Configured Interval>, so that we can just check whether time elapsed or not when checking. We don;t need to do + op every time.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##########
@@ -72,6 +75,9 @@ public boolean addContainer(Container<?> container) throws
       }
       // wish we could have done this from ContainerData.setState
       container.getContainerData().commitSpace();
+      if (container.getContainerData().getState() == RECOVERING) {

Review Comment:
   As suggested below in tests, we can use clock.millis() for time.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -989,8 +991,14 @@ public void markContainerForClose(Container container)
       throws IOException {
     container.writeLock();
     try {
+      ContainerProtos.ContainerDataProto.State state =
+          container.getContainerState();
       // Move the container to CLOSING state only if it's OPEN/RECOVERING
-      if (HddsUtils.isOpenToWriteState(container.getContainerState())) {
+      if (HddsUtils.isOpenToWriteState(state)) {
+        if (state == RECOVERING) {

Review Comment:
   Why are we removing here?



##########
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestStaleRecoveringContainerScrubbingService.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.ozone.container.common;
+
+import org.apache.hadoop.conf.StorageUnit;
+import org.apache.hadoop.hdds.HddsConfigKeys;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException;
+import org.apache.hadoop.ozone.container.common.impl.ContainerLayoutVersion;
+import org.apache.hadoop.ozone.container.common.impl.ContainerSet;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet;
+import org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy;
+import org.apache.hadoop.ozone.container.keyvalue.ContainerTestVersionInfo;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer;
+import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData;
+import org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils;
+import org.apache.hadoop.ozone.container.keyvalue.statemachine.background.StaleRecoveringContainerScrubbingService;
+import org.apache.ozone.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Iterator;
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+import static org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State.CLOSED;
+import static org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State.RECOVERING;
+import static org.mockito.ArgumentMatchers.anyList;
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.Mockito.mock;
+
+/**
+ * Tests to stale recovering container scrubbing service.
+ */
+@RunWith(Parameterized.class)
+public class TestStaleRecoveringContainerScrubbingService {
+
+  @Rule
+  public final TemporaryFolder tempDir = new TemporaryFolder();
+  private String datanodeUuid;
+  private OzoneConfiguration conf;
+  private HddsVolume hddsVolume;
+
+  private final ContainerLayoutVersion layout;
+  private final String schemaVersion;
+  private String clusterID;
+  private int containerIdNum = 0;
+  private MutableVolumeSet volumeSet;
+  private RoundRobinVolumeChoosingPolicy volumeChoosingPolicy;
+
+  public TestStaleRecoveringContainerScrubbingService(
+      ContainerTestVersionInfo versionInfo) {
+    this.layout = versionInfo.getLayout();
+    this.schemaVersion = versionInfo.getSchemaVersion();
+    conf = new OzoneConfiguration();
+    ContainerTestVersionInfo.setTestSchemaVersion(schemaVersion, conf);
+  }
+
+  @Parameterized.Parameters
+  public static Iterable<Object[]> parameters() {
+    return ContainerTestVersionInfo.versionParameters();
+  }
+
+  @Before
+  public void init() throws IOException {
+    File volumeDir = tempDir.newFolder();
+    conf.set(ScmConfigKeys.HDDS_DATANODE_DIR_KEY, volumeDir.getAbsolutePath());
+    conf.set(HddsConfigKeys.OZONE_METADATA_DIRS, volumeDir.getAbsolutePath());
+    datanodeUuid = UUID.randomUUID().toString();
+    hddsVolume = new HddsVolume.Builder(volumeDir.getAbsolutePath())
+        .conf(conf).datanodeUuid(datanodeUuid).clusterID(clusterID).build();
+    volumeSet = mock(MutableVolumeSet.class);
+    clusterID = UUID.randomUUID().toString();
+    volumeChoosingPolicy = mock(RoundRobinVolumeChoosingPolicy.class);
+    Mockito.when(volumeChoosingPolicy.chooseVolume(anyList(), anyLong()))
+        .thenReturn(hddsVolume);
+  }
+
+  @After
+  public void cleanup() throws IOException {
+    BlockUtils.shutdownCache(conf);
+  }
+
+  /**
+   * A helper method to create a number of containers of given state.
+   */
+  private void createTestContainers(
+      ContainerSet containerSet, int num,
+      ContainerProtos.ContainerDataProto.State state)
+      throws StorageContainerException {
+    int end = containerIdNum + num;
+    for (; containerIdNum < end; containerIdNum++) {
+      KeyValueContainerData recoveringContainerData = new KeyValueContainerData(
+          containerIdNum, layout, (long) StorageUnit.GB.toBytes(5),
+          UUID.randomUUID().toString(), datanodeUuid);
+      //create a container with recovering state
+      recoveringContainerData.setState(state);
+
+      KeyValueContainer recoveringKeyValueContainer =
+          new KeyValueContainer(recoveringContainerData,
+              conf);
+      recoveringKeyValueContainer.create(
+          volumeSet, volumeChoosingPolicy, clusterID);
+      containerSet.addContainer(recoveringKeyValueContainer);
+    }
+  }
+
+  @Test
+  public void testScrubbingStaleRecoveringContainers()
+      throws StorageContainerException, InterruptedException, TimeoutException {
+    ContainerSet containerSet = new ContainerSet();
+    StaleRecoveringContainerScrubbingService srcss =
+        new StaleRecoveringContainerScrubbingService(
+            Duration.ofMillis(50).toMillis(),
+            TimeUnit.MILLISECONDS, 10,
+            Duration.ofSeconds(300).toMillis(),
+            containerSet, conf);
+    srcss.setRecoveringTimeout(10);
+
+    srcss.start();
+    GenericTestUtils.waitFor(() -> srcss.getThreadCount() > 0, 100, 3000);
+    createTestContainers(containerSet, 5, CLOSED);
+    Thread.sleep(100L);
+    //closed container should not be scrubbed
+    Assert.assertTrue(containerSet.containerCount() == 5);
+
+    createTestContainers(containerSet, 5, RECOVERING);
+    Thread.sleep(100L);
+    //recovering container should be scrubbed since recovering timeout
+    Assert.assertTrue(containerSet.containerCount() == 5);
+    Iterator<Container<?>> it = containerSet.getContainerIterator();
+    while (it.hasNext()) {
+      Container<?> entry = it.next();
+      Assert.assertTrue(entry.getContainerState().equals(CLOSED));
+    }
+
+    //increase recovering timeout
+    srcss.setRecoveringTimeout(100000);
+    createTestContainers(containerSet, 5, RECOVERING);

Review Comment:
   To avoid this kind of sleeps, we followed an interesting approach of passing the clock ( Credit: Originally suggested by @sodonnel ). So, I would suggest to pass the clock and so, that we can fast forward the clock.
   Please refer: https://github.com/apache/ozone/pull/3008/files



##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java:
##########
@@ -187,6 +187,23 @@ public final class OzoneConfigKeys {
   public static final int OZONE_BLOCK_DELETING_SERVICE_WORKERS_DEFAULT
       = 10;
 
+  /**
+   * Configuration properties for Ozone Recovering Container Scrubbing Service.
+   */

Review Comment:
   If we want to keep them as dev configs, we don;t need to add i xml, but may need to add into skipping lists in tests I guess.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3591:
URL: https://github.com/apache/ozone/pull/3591#issuecomment-1185166985

   I am ok if we want to take up the clock changes into followup JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on PR #3591:
URL: https://github.com/apache/ozone/pull/3591#issuecomment-1185356016

   thanks @umamaheswararao for the review, i have fix the comments and take up the clock changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3591:
URL: https://github.com/apache/ozone/pull/3591#discussion_r922169585


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -989,8 +991,14 @@ public void markContainerForClose(Container container)
       throws IOException {
     container.writeLock();
     try {
+      ContainerProtos.ContainerDataProto.State state =
+          container.getContainerState();
       // Move the container to CLOSING state only if it's OPEN/RECOVERING
-      if (HddsUtils.isOpenToWriteState(container.getContainerState())) {
+      if (HddsUtils.isOpenToWriteState(state)) {
+        if (state == RECOVERING) {

Review Comment:
   oh yeah. It was my overlook. I realized, you actually created a separateMap in ContainerSet class and we are just removing from that. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a diff in pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on code in PR #3591:
URL: https://github.com/apache/ozone/pull/3591#discussion_r921893486


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -989,8 +991,14 @@ public void markContainerForClose(Container container)
       throws IOException {
     container.writeLock();
     try {
+      ContainerProtos.ContainerDataProto.State state =
+          container.getContainerState();
       // Move the container to CLOSING state only if it's OPEN/RECOVERING
-      if (HddsUtils.isOpenToWriteState(container.getContainerState())) {
+      if (HddsUtils.isOpenToWriteState(state)) {
+        if (state == RECOVERING) {

Review Comment:
   if we mark a recovering container for close  at datanode, it means this replica has been recovered successfully, so it should not be scrubbed by `staleRecoveringContainerScrubbingService`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao merged pull request #3591: HDDS-6987. EC: Implement RECOVERING Container Scrubber

Posted by GitBox <gi...@apache.org>.
umamaheswararao merged PR #3591:
URL: https://github.com/apache/ozone/pull/3591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org